The goal of this work was to assess whether brain structural changes captured by subsequent magnetic resonance images can indicate the presence of abnormal amyloid levels in cognitively unimpaired subjects using machine learning techniques. In addition, we also aimed at characterizing the preclinical signature voxel-wise using Jacobian determinant maps as a measure of volumetric rate of change.
A machine learning framework was implemented for the classification of amyloid-positive subjects using Jacobian determinant maps as features for classification. The best achieved performance in our longitudinal classifier (AUC 0.87) significantly improved the performance we previously reported for a cross-sectional classifier (AUC 0.76) [13]. This performance is significantly higher than what was reported in previous works that, on top of using MRI ROI data, built classifiers adding demographics (AUC 0.63), demographics and genetics (AUC 0.62–0.66), and demographics, neuropsychology, and APOE (AUC 0.74) [14, 28]. It is possible that adding complementary information to the MRI such as demographics and genetic risk factors may improve the performance of our machine learning classifier. While the field strength of the scanners is 1.5 T for all subjects, there is large heterogeneity in the site ID, so we believe this has had small or no influence on the performance metrics of the classifier.
The increased performance of our classifier may be accounted for two factors. On the one hand and unlike similar previously reported classifiers, we used voxel-wise data as features. Coupled with an efficient feature selection strategy, this allowed the classifier to select the most discriminant brain regions, independent of a priori cortical parcellations. On the other hand, we used subsequent images that correspond to the same individuals, thus eliminating an important percentage of the between-subject variability present in cross-sectional setups.
In this regard, we observed that our classifier works significantly better only when the pairs of MRI scans that are used for evaluation are acquired more than 2.5 years apart. This time period is likely related to the protracted evolution of the neuroanatomical changes in preclinical AD stages. At more advanced stages of the disease, more rapid evolution of brain structural changes is expected, and thus, the benefits of a longitudinal classifier would potentially be evident with shorter time intervals. It remains to be explored how these promising results would be affected by the use of different scanners. Still, a time gap of 2.5 for resolving preAD is within the timescale of relevance for AD screening or the follow-up of subjects enrolled in secondary prevention clinical trials, which typically last a decade. In this context, this work and our earlier study on MRI using ML [13] show that even though the performance of the ML classifier is not high, if implemented as a screening tool it can save resources in a clinical trial setting.
The main discriminative features between amyloid positive and healthy controls mostly included AD-related areas in the medial and inferior temporal lobe, as well as the lateral ventricles which can be considered as the preclinical AD signature. Increased expansion of the lateral and inferior lateral ventricles in cognitively unimpaired individuals with lower levels of CSF amyloid-beta has been shown previously, along with increased atrophy in the fusiform gyri as well as in middle temporal and posterior cingulate cortices [33,34,35,36,37]. In this regard, the preclinical AD signature found in our study does not significantly depart from published reports and, as can be seen in Fig. 6, is very much in line with the expected pattern of atrophy in AD, though to a lesser magnitude and extent.
In addition to (peri)ventricular regions, Fig. 5 also shows the fusiform gyri and middle temporal regions to display significant discriminative capacity to discriminate amyloid-positive vs amyloid-negative CU individuals, as expected [34]. Additional detail on the brain areas contributing to such discriminative power is now provided in Additional file 1: Table S1.
The predictive capacity achieved by this classifier does not place this method as substitute of gold-standard tests to detect amyloid abnormalities. Still, if used for triaging of subjects, e.g., clinical trial recruitment, we demonstrated that it could allow significant savings in terms of the number of costly gold-standard tests that would have to be performed to detect a fixed number of amyloid-positive, cognitively healthy subjects. Used in this way, in a cognitively unimpaired population with a prevalence of amyloid positivity of 20%, the accuracy of the longitudinal classifier would allow a reduction of up to 55% of unnecessary PET or CSF tests, which translates to a 40% reduction of the total cost, according to the savings model we previously proposed [13]. Nevertheless, in a clinical trial recruitment setting, it can be more advantageous instead to optimize the sensitivity of the classifier to maximize the number of detected at-risk individuals, at the cost of a slightly poorer specificity which might decrease these cost savings.
Due to the limited sample size for training and the large inter-subject variability of cerebral morphology, we use a simple but effective model for prediction of amyloid positivity. Our method is fully automatic from feature extraction and signature learning to classification. However, the presence of high-dimensional and low informative features together with the overlap between normal aging and AD processes in the brain reduces the overall precision of the system. To account for that, future efforts will need larger longitudinal datasets and many initiatives are contributing to achieve this [14, 29].
We observe much higher sensitivity than specificity. This is likely given the limited size and imbalance of the cohort but also most likely due to the fact that we are imposing an imbalance on the test set to simulate the preAD prevalence of 20% typically found in a clinical trial setting.
On top of this, given the limited sample size and the large amount of features used for classification (voxels), we might have incurred in an overfitting of the existing data, potentially resulting in an overestimation of the capacity of the classifier. Therefore, our results need to be validated on independent datasets, but the scarcity of longitudinal MRI datasets with CSF biomarker levels has prevented us to conduct such validation in this work. Still, in our previous ROI-based study, we successfully validated a very similar classifier with two independent datasets without a major loss of the classifier’s performance [13].
To further characterize the preclinical AD signature, a statistical analysis was conducted and we report longitudinal morphological changes in cognitively unimpaired subjects with abnormal amyloid CSF levels. This preclinical AD signature comprises atrophy of the parahippocampal and fusiform gyri and expansion of the lateral ventricles. This pattern is in line with previous reports of longitudinal volumetric changes associated with the presence of abnormal amyloid levels from ADNI participants that have been replicated in an independent cohort [10]. On the other hand, expansion of the caudate heads falls beyond this known pattern. Being in the proximity of the lateral ventricles, it may be questioned whether the detected increase in the volume of the caudates is an actual feature associated to preclinical AD stages or an artifact of the processing methodology to detect volumetric changes. By smoothing spatially continuous Jacobian determinant maps, it could be considered that the observed increase in caudate volumes could be a side effect of the “spillover” of the Jacobian determinant maps due to the expansion of the ventricles. To address this question, we performed a post hoc analysis of the caudate volumes between the Ctrls and PreAD groups, but using the longitudinal Freesurfer pipeline to compute change in caudate volumes. Since the subcortical segmentation implemented in Freesurfer uses an ROI approach based on a probabilistic atlas [30], it can be considered to be virtually free from the potential spillover effect of continuous Jacobian determinant maps. Results show that the changes in caudate volumes are not significantly different between Ctrls and PreAD individuals (p > 0.3) and, thus, it can be concluded that the observed caudate head expansion is artifactual and secondary to ventricular expansion. Still, this signal might contribute to the detection of the presence of amyloid burden in cognitively unimpaired individuals.
This study has some limitations. Even though data comes from a heterogeneous sample with different sites, and MRI scanners, the MRI acquisition was harmonized according to the ADNI protocol. Therefore, the performance of our method when applied to MRI samples using different acquisition protocols may deviate from what is here reported. Actually, the ultimate validation of the generalizability of the results here reported can only be accomplished by applying the method here developed to an independent sample. In our previous work, the performance of a similar cross-sectional classifier was kept stable when derived and validated in two independent cohorts. Therefore, it can be expected the same behavior in this longitudinal extension of the classifier. Our study relies on the ADNI cohort which is well-known for its data quality and unique in having corresponding MRI and CSF data and a longitudinal aspect required for a study using Jacobian determinants. The low amount of subjects with MRIs acquired with more than 2.5 years needed for a good signal to noise ratio certainly impose a limitation to our results and encourage future validation efforts. For example, one misclassification error has a huge impact on the performance metrics. To mitigate this effect, we repeated the workflow 100 times in order to report mean performance metrics. Nevertheless, the effect of misclassification can still be observed in the large confidence intervals that are found for each one of the metrics.
Finally, we used CSF amyloid as the gold-standard for amyloid positivity and not PET imaging. It could be argued that the performance of the classifier could be sensitive to the selection of the gold-standard method. However, the agreement between CSF and PET determinations of amyloid is very high, particularly in the intermediate ranges where thresholds for positivity typically lie.
One interesting area for further exploration is the classification subjects that undergo a transition between normal and preclinical amyloid biomarkers within the timeframe of two consecutive scans. In principle, one could hypothesize that this category of “transitioning” subjects will not necessarily follow the same pattern of brain volumetric change as either the normal or the preclinical group.
Unfortunately, only a subset of 13 subjects respond to these criteria; from these, only 2 subjects undergo this transition within a time frame of dt < 2.5 years between consecutive scans. The sample size is therefore too small for a machine learning workflow. Nevertheless, the prediction of a transition from normal to preclinical AD stages is a question of utmost importance to research (e.g., observational studies) and clinical practice (e.g., clinical trials) and a natural follow-up to the present study.
To sum up, we here presented a machine learning framework used to predict the presence of amyloid abnormalities in cognitively unimpaired individuals with a moderate-to-high accuracy (AUC 0.87) when MRI scans acquired 2.5 years apart are available. This performance translates to improvements of up to 55% in the number of necessary CSF/PET tests and a reduction of 40% of the costs to detect a fixed number of amyloid-positive individuals. This performance may still have room for improvement by including demographic, genetic, and cognitive data to the classifier. We further compare the features used by the classifier with the characteristic pattern of longitudinal morphological changes in preclinical AD that is expressed in typical AD-related regions, uncovering areas that appear to be specific to the preclinical AD stage.