Participants in the Alzheimer’s disease dementia study
All data were obtained from ADNI, a multi-site observational study, which were acquired in accordance with each site’s respective Institutional Review Board, including obtaining written consent acquired from each participant. We included 2918 scans (Nhealthy control = 1943, NAD = 975) from 626 subjects as training set, 382 scans (Nhealthy control = 251, NAD = 131) from 80 subjects as validation set, and 325 scans (Nhealthy control = 229, NAD = 96) from 80 subjects as test set.
Our data augmentation method of using scans from multiple visits of the same participant requires dealing with two problems: data leakage and disease progression. Data leakage is the problem of including different scans from the same participant in the training and test set; the trained model might make the prediction by matching the subject instead of extracting disease-relevant patterns. In this study, the training, validation, and test sets were partitioned at the subject level to ensure non-overlapping subjects. Disease progression is the problem that the diagnosis status of subjects might change during follow-up visits, and the diagnosis at scan time might be different from the baseline label. In this study, we labeled all the scans with their cross-sectional diagnosis at scan time, and although one participant’s diagnostic labels may change, and therefore appear in both groups, there are few such cases.
Participants in the “Mild Cognitive Impairment” study
From ADNI, we identified a cohort of participants who were diagnosed with MCI at baseline and who had a complete set of CSF amyloid and tau biomarkers and structural MRI (N = 582; the inclusionary and exclusionary algorithm is illustrated in Fig. S1). Among these, 205 participants progressed to AD dementia at follow-up (“MCI progression” group), and 179 participants remained MCI stable for at least 4 years (“MCI stable” group). The time distribution and demographics of these two groups are shown in Fig. 2.
The deep learning MRI score
The deep learning model used in this study is a three-dimensional convolutional neural network (3D CNN) model with five convolutional stages and one fully connected layer with sigmoid output [5]. Each convolutional stage consists of two convolutional layers with rectified linear unit (ReLU) activation function, a batch normalization operation and a max pooling layer. The model was optimized using the ADAM method with cross-entropy loss, using a learning rate of 2e−5 determined through a grid search. The model was trained on the brain-extracted T1-weighted structural MRI scans from the ADNI cohort to classify patients in the dementia stage of AD versus healthy control subjects. To evaluate the regional contribution to AD classification, we generated a 3D class activation map, which visualizes the predictive regions in deep learning classification models [31, 32].
We applied the model trained to classify AD dementia versus healthy controls to the baseline scans of patients diagnosed with MCI. The continuous output from the model is reflective of the progressive structural patterns of AD pathology. We refer to it as a “deep learning MRI” (DLMRI) score, where a value of 0 is likely to be cognitively normal and 1 is likely to be AD. All analyses were performed using this score.
Amyloid and tau biomarkers
CSF biomarkers
CSF tau levels, reflective of neurofibrillary tangle, and CSF Aβ levels, reflective of amyloid pathology, were included in the analysis [33]. Additionally, the tau/Aβ ratio, which has been shown to best capture AD [34], was also included [35]. CSF was acquired at individual ADNI sites in accordance with the ADNI acquisition protocols and analyzed as previously described [35], using the multiplex xMAP Luminex platform. The median values provided by ADNI were used.
PET measures
In a subset of participants (NMCI progression = 94, NMCI stable = 154), amyloid pathology was also estimated with PET, mapping amyloid burden with the amyloid-binding radioligand AV45. The composite AV45-PET score provided by ADNI [36] was used in the analyses, which is based on the average AV45 SUVR (standard uptake value ratio) of the frontal, anterior cingulate, precuneus, and parietal cortex relative to the cerebellum [37].
Neurodegeneration biomarkers
MRI morphometry
FreeSurfer 6.0 [38, 39] was used to segment the structural MRI scans and derive regional morphometric measures. Hippocampal (HC) volume, entorhinal cortex (EC) volume, and entorhinal cortex thickness were used as structural integrity measures of the hippocampal formation. Hippocampal and entorhinal cortex volumes were normalized by the intra-cranial volume (ICV).
PET measures
In a subset of participants (NMCI-progression = 94, NMCI-stable = 154), neurodegeneration was also estimated with PET using fluorodeoxyglucose (FDG). The composite FDG score provided by ADNI [36] was used in the analyses, which is based on the average FDG uptake of angular, temporal, and posterior cingulate [23].
Additional measures
Behavioral and neuropsychological measures
The Mini-Mental State Examination (MMSE) score and Rey Auditory Verbal Learning Test (RAVLT) retention scores were used in the analysis. The RAVLT retention score measures the number of delayed recalled words divided by the number of words learned in the last learning trial (trial 5) and has been found to be one of the most sensitive to AD23.
Neuropathology
Among subjects with postmortem neuropathology data, 44 cases were identified who had an MRI within 2 years prior to death, and 29 cases were identified who had MRI within 1 year prior to death. DLMRI scores were derived from the last antemortem MRI scans in this cohort. An association was investigated between the DLMRI score and the neuropathologically derived Braak stage, which reflects neurofibrillary tangles [26], and the Thal phase, which reflects amyloid plaques [25].
Tau-PET
ADNI began acquiring PET scans using the AV1451 radioligand, which binds neurofibrillary tangles [40], in the late phase of ADNI2 and resumed in ADNI3. Due to the smaller number of subjects with available longitudinal tau-PET data or follow-up visits, cross-sectional analyses on these subjects (N = 296) using the regional AV1451 retention levels provided by ADNI [36] were performed.
Statistical analysis
ROC analysis
A receiver operating characteristic (ROC) analysis was used to determine the accuracy of the deep learning MRI score in prodromal AD classification, i.e., MCI stable and MCI progression classification, using standardized residuals controlling for age, sex, and APOE ε4 frequency with linear regression. The DeLong test [41] was used to test for the significance of the differences in the area under the ROC curve (AUROCs) between DLMRI score and other measures using the pROC R package [42].
Survival analysis
Cox proportional hazards regression models were fit to examine the association between each baseline measure and time to conversion to AD dementia from MCI, controlling for age, sex, and APOE ε4 frequency, using the survival R package [43]. MCI-stable participants are included in the models as censored data with the last visit as the censored point. The high-risk and low-risk survival curves were generated with the 75% percentile and 25% percentile of the observed measures, respectively.
Longitudinal analysis
The longitudinal association between DLMRI score and CSF biomarkers was studied by examining the deviation from baseline measurements for each participant over time. From the “MCI progression” and “MCI stable” groups, we further identified participants that had at least one follow-up of both MRI and CSF and collapsed them into a group for longitudinal analysis (n = 238). The changes in either CSF biomarker or DLMRI score of all follow-up visits from baseline were used to estimate the slope β of the change in tau (Δtau), Aβ (ΔAβ), and tau/Aβ ratio (Δtau/Aβ) versus the change in DLMRI score (ΔDLMRI) for each participant using linear regression through the origin. Each participant was represented by the point based on the last follow-up visit’s ΔDLMRIlast (x-coordinate) and the fitted change βΔDLMRIlast (y-coordinate) of the respective measure. The last follow-up visit was used to anchor the representation of the participant in order to reflect the full follow-up. A correlation analysis was performed across participants. A linear regression model was fit across participants and illustrated.
Correlational analysis
A partial correlation was performed between baseline DLMRI score and CSF biomarkers, regional tau-PET measures, controlling for age, sex, and APOE ε4 frequency. As the Braak stage of neurofibrillary tangles and the Thal phase of amyloid plagues are both rank ordinal measures, we correlated the DLMRI score with the neuropathological measures using Spearman correlation.
Multivariate analysis of biomarkers from multiple categories
Linear SVM analyses were performed using individual and combined categories of data for prodromal AD classification in the MCI group. Fivefold cross-validation was performed, and the average AUROC scores on the test splits were reported.