The dynamics of biomarkers across the clinical spectrum of Alzheimer’s disease

Background Quantifying changes in the levels of biological and cognitive markers prior to the clinical presentation of Alzheimer’s disease (AD) will provide a template for understanding the underlying aetiology of the clinical syndrome and, concomitantly, for improving early diagnosis, clinical trial recruitment and treatment assessment. This study aims to characterise continuous changes of such markers and determine their rate of change and temporal order throughout the AD continuum. Methods The methodology is founded on the development of stochastic models to estimate the expected time to reach different clinical disease states, for different risk groups, and synchronise short-term individual biomarker data onto a disease progression timeline. Twenty-seven markers are considered, including a range of cognitive scores, cerebrospinal (CSF) and plasma fluid proteins, and brain structural and molecular imaging measures. Data from 2014 participants in the Alzheimer’s Disease Neuroimaging Initiative database is utilised. Results The model suggests that detectable memory dysfunction could occur up to three decades prior to the onset of dementia due to AD (ADem). This is closely followed by changes in amyloid-β CSF levels and the first cognitive decline, as assessed by sensitive measures. Hippocampal atrophy could be observed as early as the initial amyloid-β accumulation. Brain hypometabolism starts later, about 14 years before onset, along with changes in the levels of total and phosphorylated tau proteins. Loss of functional abilities occurs rapidly around ADem onset. Neurofilament light is the only protein with notable early changes in plasma levels. The rate of change varies, with CSF, memory, amyloid PET and brain structural measures exhibiting the highest rate before the onset of ADem, followed by a decline. The probability of progressing to a more severe clinical state increases almost exponentially with age. In accordance with previous studies, the presence of apolipoprotein E4 alleles and amyloid-β accumulation can be associated with an increased risk of developing the disease, but their influence depends on age and clinical state. Conclusions Despite the limited longitudinal data at the individual level and the high variability observed in such data, the study elucidates the link between the long asynchronous pathophysiological processes and the preclinical and clinical stages of AD.


S.1 Additional details on the statistical model
A GLMM with fixed effects of m variables and a random intercept term takes the general form where , is the probability of participant transitioning from state A to state B at observation . 0 is the mean intercept and is the log odds ratio associated with a one unit increase in variable . ~ (0, 2 ) represents a specific deviance from 0 for individual which accounts for the variability in the likelihood of transitioning between individuals. ~ (0, 2 ) represents the random error which accounts for the variability within individuals. Hence, parameters 2 and 2 represent the variance between and within individuals, respectively.

S.2 Derivation of the expected time to MCI and ADem
The process described in the schematic diagram of Fig. 1    Mean probabilities of transitioning to a more severe state within one year (top), and the expected time to reach a more severe state (bottom), as a function of age in different risk groups defined by the educational level and APOE ε4 status. It is observed that higher educational attainment tends to decrease the mean of the probability of developing ADem and slow down the rate of clinical progression, but more data from individuals from a broad range of educational backgrounds is required to support this result. Edu = Education.

S.7 Further model evaluation
In this section, we perform additional analysis to further evaluate the algorithms that have been developed for the estimation of the biomarker trajectories.

S.7.1 Validation of the sigmoidal and linear models
In addition to bootstrapping that has been performed to produce confidence intervals for the estimates of the average biomarker trajectories, we evaluated the performance of the sigmoidal and linear models using a 10-fold cross-validation technique. The sample used for the estimation of the expected times to the onset of ADem (Table 1), was randomly partitioned into 10 (non-overlapping) subsamples of approximately equal size of observations. We then performed 10-fold cross-validation and calculated the Normalised Root Mean Square Error for evaluation (NRMSE, normalised by the difference between the maximum and the minimum value in the testing dataset). The output of this experiment is shown in Fig. S5.

Cognitive markers CSF markers
Plasma fluid markers

Fig. S5. Biomarker trajectories: 10-fold cross-validation of the sigmoidal and linear models.
The fitting procedure is implemented ten times. Each time, nine of the ten subsets are used as the training set, and one subset is left out as the test set. The dashed lines show the best-fit for the training sets in each of the 10 rounds of the training-testing procedure. The model output in each round is evaluated using the Normalised Root Mean Square Error (NRMSE). Each figure shows the average NRMSE for the respective biomarker.

S.7.2 A holdout validation
We performed a single train-and-test experiment using the holdout validation method. The data (Table 1) was divided into two disjoint subsets. The first subset, the 'test/hold-out' set, includes all individuals that were CN at baseline and developed MCI and ADem during the longitudinal study.
Hence, for each individual in the test set the approximate time of transition to each clinical state is known and their longitudinal biomarker data can be aligned according to the observed time between each measurement and the onset of ADem clinical symptoms. There are 21 such individuals whose characteristics are presented in Table S3. The model was trained on the remaining dataset (the 'training' set). The estimated biomarker trajectories were evaluated on the test set using the NRMSE (see Fig. S6). . On average, the group of amyloid-positive individuals tends to have a higher chance of developing MCI and ADem than the whole group of individuals. Thus, the expected time to reach ADem is lower. The difference is more pronounced in the groups of noncarriers of the APOE ε4 genotype, whereas the difference in the group of APOE ε4 carriers is very small (Fig. S7), which may be due to the high proportion of carriers of APOE ε4 that are also amyloid-positive (71.25%). The average expected time to ADem of amyloid-positive individuals that have been classified as CN at baseline is 23.88 years and that of those that have been classified as MCI at baseline is 7.43 years (about four and two years, respectively, lower than those predicted for the whole sample). The output of the model that incorporates the effect of education suggests that the educational level has almost no influence on the chance of amyloid-positive individuals transitioning to a more severe clinical state within one year, and thus on the expected time to a clinical state (Fig. S8), independently of age. All the above yields the production of biomarker trajectories similar to those predicted in the whole sample (see Fig. S9, Table S4).  Table S4. Best-fit for each marker in amyloid-positive individuals (Fig. S9): mean parameter values obtained from the model and 95% confidence interval (in brackets) computed using 500 bootstrap samples.

Parameter estimates
Cognitive markers