Participants
We included participants from the European Medical Information Framework for Alzheimer’s Disease Multimodal Biomarker Discovery (EMIF-AD MBD) study. The aim of this study was to discover novel diagnostic and prognostic markers for pre-dementia AD, by making use of existing data and samples [13]. The EMIF-AD MBD study pooled data of 494 CN, 526 MCI and 201 AD-dementia participants from three multicentre and eight single-centre studies. Inclusion criteria were: presence of normal cognition, MCI or a clinical diagnosis of AD-type dementia; availability of data on amyloid pathology, measured in CSF or on PET; age above 50 years; availability of MRI scans, plasma, DNA or CSF samples (at least two of the modalities); and absence of major neurological, psychiatric or somatic disorders that could cause cognitive impairment.
From the 1221 subjects included in the EMIF-AD MBD study, MRI scans of 873 subjects were contributed by the different studies (Fig. 1). Based on visual assessment, 863 MRI scans were of sufficient quality for visual rating, consisting of 365 CN, 398 MCI and 100 AD-dementia participants. Data were obtained from the following cohorts: DESCRIPA [14], EDAR [15], PharmaCog [16] and single-centre studies at VU University Medical Centre [17], San Sebastian GAP [18], University of Antwerp [19], Leuven [20], University of Lausanne [21], University of Gothenburg [22] and Barcelona IDIBAPS [23]. Each study was approved by the local medical ethics committee. Subjects had provided written informed consent at the time of inclusion in the MBD study for sharing of data, fluid samples and scans.
Clinical and cognitive data
From all parent cohorts, clinical information and neuropsychological tests were collected centrally, harmonized, pooled and stored in an online data platform as previously described [13]. In short, all parent cohorts administered the Mini-Mental State Examination (MMSE), and performed neuropsychological testing covering various cognitive domains, although the tests used varied across the different cohorts. For the cognitive domains memory, language, attention, executive functioning and visuo-construction, one priority test was selected from each cohort (Additional file 1: Table S1) and z-scores were computed based on local normative data when available, or published normative data from healthy controls otherwise.
APOE genotyping
For the entire EMIF-AD MBD cohort, APOE genotyping data from the local genetic analyses were available for 1121 (91%) individuals. Central genetic analyses were performed at Lübeck University, Germany for 805 DNA and 148 whole blood samples. From the blood samples, DNA was extracted using the QIAamp® DNA Blood Mini Kit (QIAGEN GmbH, Hilden, Germany) resulting in 953 DNA samples, of which 926 passed quality control. Genome-wide SNP genotyping was performed using the Infinium Global Screening Array (GSA) with Shared Custom Content (Illumina Inc.). APOE genotypes were determined either directly (rs7212) or by imputation (rs429358). For 80 samples for which no local APOE genotype was available, and for 45 mismatches between local and GSA-derived genotypes, the APOE genotype was determined using TaqMan assays (ThermoFisher Scientific, Foster City, CA, USA) on a QuantStudio-12 K-Flex system. TaqMan re-genotyping confirmed 23 GSA genotypes and 21 local genotypes. For one failed sample we retained the local genotype. We classified individuals as APOE ε4 carriers or non-carriers according to their genotype status at rs429358 (C-allele = ε4).
Amyloid classification
In the current selection (n = 863), amyloid status was defined by central analysis of CSF when available (n = 510), otherwise by local amyloid PET (n = 174) or local CSF (n = 179) measures. Central CSF analysis was performed at Gothenburg University, Sweden and included Aβ1–40 and Aβ1–42 measured using the V-PLEX Plus Aβ Peptide Panel 1 (6E10) Kit (Meso Scale Discovery, Rockville, MD, USA), as described by the manufacturer. The central cut-off value for Aβ positivity was an Aβ42/40 ratio < 0.061. Amyloid PET was performed in one cohort using [18F]flutemetamol according to local standardized procedures, with a standardized uptake value ratio (SUVR) cut-off value > 1.38 used for abnormality [24]. In short, SUVR images were computed from spatially normalized summed images with cerebellar grey matter as the reference region. The cut-off value was derived from an independent dataset [25] and based on the statistical difference between AD dementia patients and cognitively normal subjects [24]. Local CSF amyloid was determined according to local protocols with local cut-off values. The number of amyloid positive subjects per diagnosis per cohort is presented in Additional file 1: Table S2.
MRI acquisition
At each site, imaging was acquired according to local protocols. From each parent cohort, we centrally collected the T1-weighted images, and if available also fluid-attenuated inversion recovery (FLAIR) and susceptibility weighted images (SWI) or T2*, at the VU University Medical Center, where a visual quality check was performed. The acquired sequences and acquisition parameters for the T1-weighted scans for each cohort are presented in Additional file 1: Table S3. Usually, MRI was assessed at baseline together with baseline cognitive and amyloid measures. For 104 subjects there was more than a 1-year difference between MRI acquisition and amyloid assessment. In cases where amyloid was abnormal and acquired before MRI, this subject was included in the analysis (n = 42). In cases where amyloid was normal and acquired after MRI, this subject was included in the analysis (n = 9). All other cases were excluded (n = 53). For 99 subjects there was more than a 1-year difference between baseline cognitive assessment and MRI. For these cases, we did not use the cognitive data in the multi-variable analysis. Demographic differences between subjects who were included and excluded for differences in time between MRI and amyloid or cognitive assessment are presented in Additional file 1: Tables S4 and S5.
MRI visual rating
MRI scans with sufficient quality (n = 863) were visually rated by a single experienced rater, blinded to demographic information during rating. Medial temporal lobe atrophy (MTA) was assessed on coronal reconstructions of the T1-weighted images using a 5-point scale ranging from no atrophy (0) to end-stage atrophy (4) [26]. The MTA results from the left and right hemisphere were averaged. Global cortical atrophy (GCA) was assessed on transversal FLAIR or T1 images using a 4-point scale [27]. Posterior atrophy was assessed using a 4-point scale [28] and averaged over hemispheres. White matter hyperintensities were visually assessed on FLAIR images (n = 812) using the 4-point Fazekas scale (none, punctate, early confluent, confluent) [29]. Microbleeds were assessed on SWI and/or T2* images (n = 445) and defined as rounded hypointense homogeneous foci of up to 10 mm in diameter in the brain parenchyma. Microbleeds were dichotomized as present (≥ 1 microbleeds) or absent (0 microbleeds).
MRI quantitative analysis
Good quality 3D T1 images (n = 850) were uploaded on the N4U platform (https://neugrid4you.eu/) for automated quantitative processing. Subcortical volumes, cortical thickness and surface area measures were estimated from 3D T1 MRI using Freesurfer (version 5.3.0, https://surfer.nmr.mgh.harvard.edu) as previously described [30]. All segmentations were visually inspected. We excluded data from 20 subjects for subcortical volumes (five due to complete failure of the algorithm and 15 due to segmentation errors) and from 75 subjects for cortical thickness and surface area (five due to complete failure of the algorithm, 66 due to segmentations errors of the cortical ribbon and four for other failures). Subcortical volumes were normalized by total intracranial volume (TIV). Cortical thickness and surface area were available for 68 regions according to the Desikan–Killiany atlas implemented in Freesurfer. Additionally, we computed two AD-signature meta-ROI measures that have previously been presented in the literature: one by Dickerson et al. [10] consisting of the average cortical thickness in angular, precuneus, supramarginal, superior frontal, superior parietal, temporal pole, inferior temporal, medial temporal and inferior frontal cortex; and one by Jack et al. [31] consisting of the surface-area weighted average mean cortical thickness in entorhinal, inferior temporal, middle temporal and fusiform regions.
Statistical methods
Univariate analysis
Univariate statistical analyses were performed in R (version 3.3.1). Comparisons of clinical characteristics between amyloid positive and negative subjects within each diagnostic group were performed using independent t tests or Mann–Whitney U tests for continuous variables and chi-square tests for categorical variables. Baseline comparisons in quantitative MRI measures between groups were performed with linear mixed models (continuous outcome measures) (lme4 package, version 1.1–12; lmerTest package 2.0–36), mixed effects ordered logistic regressions (ordinal outcome measures) (ordinal package, version 2015.6–28) and mixed effects logistic regressions (dichotomous outcome measures) (lme4 package). In each model, we entered amyloid status (negative, positive) and diagnosis (CN, MCI and AD) and their interaction as fixed effects. Age (centred on mean), gender and APOE ε4 status were added as covariates. Cohort was added as a random intercept. The analyses were corrected within diagnostic group (in total 22 tests: five visual ratings, 14 subcortical volumes, three cortical thickness summary measures) for multiple hypothesis testing with the p.adjust() function using the false discovery rate, and indicated as pFDR.
Multi-variable analysis
To find the best multi-variable predictor of amyloid pathology, we used a supervised machine-learning approach based on SVM analysis. In SVM, two classes are separated by finding a hyperplane that maximizes the margin of separation between data points of each class in a high-dimensional feature space. SVMs are used extensively in neuroimaging as they have been shown to predict outcomes with high accuracy and possess the ability to model diverse and high-dimensional data [32]. We built a classifier to separate amyloid positive from amyloid negative subjects separately in the CN and MCI subgroups and, for the sake of completeness, also in the whole sample (including CN, MCI and AD-dementia patients). To address the imbalance between the number of amyloid positive and amyloid negative subjects in each diagnostic group, we adopted the re-weighting strategy [33]. That means we adjusted weights of each SVM feature inversely proportional to amyloid positive versus negative frequencies.
Machine-learning approach
We used the python Scikit-learn library (version 0.19.1) to perform SVM classification [34]. To prevent overfitting (i.e. the classifier works perfectly on the training data, but is poorly generalizable to new data), we performed feature relevance evaluation and dimensionality reduction using a tree-based feature selection approach with a nested 10-fold cross-validation design [35, 36]. This was performed separately within each subgroup (CN, MCI and whole sample).
The nested cross-validation consists of an inner loop for model building and parameter estimation, and an outer loop for model testing. Consequently, the dataset was divided into two parts: a training plus validation subset and a test subset. In the inner loop, SVM models were trained with varying SVM hyper-parameters (i.e. cost parameters C and kernel function) based on a grid search, and a feature selection was performed using classification trees. The validation set was used to determine the SVM hyper-parameters over the grid of possible values. The performance of the resulting model, with optimized SVM hyper-parameters and features, was subsequently evaluated on the test set in the outer loop. For this outer loop, we used a 10-fold cross-validation scheme so that the data were divided into 10 equally sized parts. Nine of these were used as the training/validation set and one as the test set, and the 10 parts were permuted in each iteration of the outer loop so that each one was used for testing once. Finally, the SVM results were averaged over the 10 folds to estimate the predictive power of the proposed model on the whole dataset.
Feature selection
As the input for the classifier, we used demographic information, neuropsychological information, APOE ε4 genotype and MRI measures (visual ratings, subcortical volumes, regional cortical thickness and regional surface area measures). To combine information measured on different scales, continuous demographic and MRI measures were normalized to z-scores. In the adopted tree-based feature selection strategy, the Gini index was used to measure the relevance of each feature [37]. Features with a Gini index above the mean were kept, others were discarded. The complete list of features considered and selected, in the whole dataset and for CN and MCI separately, is reported in Additional file 1: Table S8.
Performance evaluation
To assess the performance of the classifier, we computed the averaged receiver operating characteristic (ROC) area under the curve (AUC), specificity, sensitivity and accuracy for the testing datasets. We initially maximized the Youden index, and then also explored the results when setting the sensitivity at 80%, 85%, 90%, 95% and 100%. To assess the added value of combining different sources of information, we also built classifiers including only demographic information and a single other biomarker type (neuropsychological tests, APOE ε4 genotype, MRI measures). Differences in AUC ROCs between classifiers were assessed with DeLong’s test.