Biomarker testing in MCI patients—deciding who to test

Background We aimed to derive an algorithm to define the optimal proportion of patients with mild cognitive impairment (MCI) in whom cerebrospinal fluid (CSF) testing is of added prognostic value. Methods MCI patients were selected from the Amsterdam Dementia Cohort (n = 402). Three-year progression probabilities to dementia were predicted using previously published models with and without CSF data (amyloid-beta1-42 (Abeta), phosphorylated tau (p-tau)). We incrementally augmented the proportion of patients undergoing CSF, starting with the 10% patients with prognostic probabilities based on clinical data around the median (percentile 45–55), until all patients received CSF. The optimal proportion was defined as the proportion where the stepwise algorithm showed similar prognostic discrimination (Harrell’s C) and accuracy (three-year Brier scores) compared to CSF testing of all patients. We used the BioFINDER study (n = 221) for validation. Results The optimal proportion of MCI patients to receive CSF testing selected by the stepwise approach was 50%. CSF testing in only this proportion improved the performance of the model with clinical data only from Harrell’s C = 0.60, Brier = 0.198 (Harrell’s C = 0.61, Brier = 0.197 if the information on magnetic resonance imaging was available) to Harrell’s C = 0.67 and Brier = 0.190, and performed similarly to a model in which all patients received CSF testing. Applying the stepwise approach in the BioFINDER study would again select half of the MCI patients and yielded robust results with respect to prognostic performance. Interpretation CSF biomarker testing adds prognostic value in half of the MCI patients. As such, we achieve a CSF saving recommendation while simultaneously retaining optimal prognostic accuracy. Supplementary Information The online version contains supplementary material available at 10.1186/s13195-020-00763-7.


Background
Biomarkers such as amyloid beta1-42 (Abeta) and phosphorylated tau (p-tau) in cerebrospinal fluid (CSF) provide evidence on the neuropathological process underlying a patient's cognitive decline [1]. Determining the underlying cause of cognitive complaints is particularly useful in the pre-dementia stage of mild cognitive impairment (MCI), as it provides important prognostic information [2]. Appropriate use criteria for the use of CSF biomarkers have been published, aiming to guide clinicians in the use of these biomarkers. In these criteria, longstanding and unexplained MCI is considered an indication for additional biomarker testing [3,4]. The clinical practice guidelines of the American Association of Neurology (AAN) for MCI are more reluctant and recommend against the use of biomarkers in clinical practice as it is currently unclear how to value additional diagnostic testing in pre-dementia stages [5]. In line with this practice guideline, clinicians tend to implicitly steer against biomarker testing in MCI patients [6], even when multiple studies have shown the prognostic value of CSF biomarkers in MCI on a group level [7,8]. We think that this suboptimal use of biomarkers in the clinic might be due to the lack of practical cost-efficient tools.
In a former study, we constructed personalized prognostic models that enable estimation of prognosis in terms of dementia conversion for an individual MCI patient, based on available biomarkers [9,10]. We showed that the use of CSF biomarkers improves prognostic performance over the use of demographic information and magnetic resonance imaging (MRI) information. Nonetheless, biomarker testing is unlikely to contribute to a more accurate prognosis in every MCI patient [11][12][13]. Here, we took as a starting point the notion that these same models could have additional value as a decision support tool, to aid clinicians in selecting patients for additional CSF biomarker testing.
We aimed to derive an algorithm to select MCI patients for CSF testing and to provide an estimate of the optimal proportion of patients to undergo CSF biomarker testing.

Patients
We selected n = 402 patients with a baseline diagnosis of MCI from the Amsterdam Dementia Cohort [14,15]. Inclusion criteria were availability of MRI data, CSF data and at least 6 months of follow-up. Diagnostic workup consisted of a standardized 1-day baseline assessment. Clinical diagnosis was made by consensus in a multidisciplinary meeting [14]. Until early 2012, the MCI diagnosis was based on Petersen's criteria [16]. From 2012 onwards, we used the core clinical criteria of the National Institute on Aging-Alzheimer's Association (NIA-AA) criteria for MCI [2]. Standardized annual follow-up included a follow-up visit with the neurologist and neuropsychologist. The diagnosis was re-evaluated in a multi-disciplinary meeting of the professionals involved. Specific dementia types were diagnosed using established clinical criteria [17][18][19][20][21][22].

MRI
Scans before 2008 were performed on 1.0 and 1.5 Tesla scanners (Siemens Magnetom Avanto, Vision, Impact and Sonata, GE Healthcare Signa HDXT). From 2008 and on, MRI of the brain was performed on 3 T scanners (MR750, GE Medical Systems, Milwaukee, WI, USA; Ingenuity TF PET/MR, Philips Medical Systems, Best, The Netherlands; Titan, Toshiba Medical Systems, Japan). All images were performed according to a standardized protocol [23], of which we only used sagittal 3D T1weighted images with coronal reformats in this study. All scans were reviewed by experienced neuroradiologists. We quantified left and right hippocampal volumes (HCV, mL) using FSL FIRST (FMRIBs Integrated registration and segmentation tool), which were summed for analysis [24].

Stepwise approach
To determine which proportion of MCI patients should receive additional biomarker testing, we applied a stepwise approach. The procedure consisted of three steps.
Step 1: obtain progression probability We took as a starting point our recently published and validated prognostic models to predict probability of progression to dementia within 3 years in MCI patients. These models were constructed with Cox regression and are described and validated in van Maurik et al. (2019) [9]. Here we assigned dementia progression probabilities (range 0-100%) to patients based on clinical data only (i.e. without CSF biomarkers), based on two diagnostic scenarios and using the following two models [ We report Harrell's C statistics [30] and 3-year Brier scores [31,32]. Harrell's C statistic compares event times of pairs of patients and hence is a measure of how well the model discriminates between patients with different times to dementias. A Harrell's C score does however not mean that the model's progression probabilities are well-calibrated to the data. Therefore, we report Harrell's C together with the 3-year Brier score. The 3-year Brier score measures the quadrative distance between the dementia status after 3 years and the model progression probability, thus is reflective of prognostic accuracy capturing both discrimination and calibration.
Step 2: refine prognosis using a stepwise approach We reasoned that patients with high or low progression probabilities based on clinical data only are unlikely to benefit from additional biomarker testing, in terms of improving the prognostic accuracy for dementia conversion. On the other hand, in patients that have an initial progression probability in the center of all patients' prognostic probabilities, additional biomarker testing could improve the prognosis. Therefore, in our MCI group, we defined the median 3-year progression probability according to the demographic and/or MRI information as most uncertain since it is the predicted prevalence of 3-year progression.
Subsequently, we used a stepwise approach and added additional CSF biomarker data (Abeta and p-tau concentration in CSF; further referred to as additional CSF) to refine prognosis in the 10% (between percentile 45-55) of patients surrounding the median 3-year progression probabilities. Of note, due to the high correlation of ptau and total tau (t-tau), t-tau concentrations are not included in the models. Details on the selection of variables in the models are described elsewhere [9,10]. Meaning that after the first 10% of patients, the prognosis is refined with biomarker data in 20% of patients (between percentile 40-60), then 30% (between percentile 35-65), and so on. Supplemental Table 1 provides an overview of 3-year prognostic probabilities (i.e., probability thresholds) that correspond with these percentiles. Patients with 3-year progression probabilities outside these percentile ranges receive a prognosis from the more simple demographic or MRI model. We performed this stepwise approach by fivefold cross-validation and added additional CSF biomarkers on (1) the demographics information only and (2) demographics and MRI. Overall cross-validated performance of this stepwise model was defined based on the combination of the proportion of patients with probabilities based on clinical information only and the proportion of patients with additional CSF biomarker testing.

Step 3: classification performance comparison
We plotted cross-validated Harrell's C and 3-year Brier scores of stepwise models with increasing proportion of patients receiving biomarker testing against the models with clinical data only (demographics only/demographics and MRI) and the model with additional CSF testing for all patients. This allowed us to identify the optimal proportion of patients where the stepwise approach performed better than the model with clinical data only and equally good as the additional CSF biomarker model in terms of prognostic discrimination (Harrell's C) and prognostic accuracy (3-year Brier scores).
As we used percentiles of the calculated prognostic probabilities with demographic and/or MRI data, the optimal proportion that is selected corresponds with certain demographic or MRI-model derived probabilities (supplemental Table 1). As a result, the optimal proportion also provided us with an algorithm that defines the threshold of demographic or MRI-model derived probabilities where additional biomarker testing would be indicated, further referred to as probability thresholds.

Evaluation of stepwise approach
Lastly, we applied the identified probability thresholds found by the stepwise approach in the BioFINDER cohort [33]. From the BioFINDER study, we included n = 221 patients with a baseline diagnosis of MCI with available MRI and CSF data and at least 6 months of followup. Prognostic probabilities are calculated based on demographic information only and on demographic and MRI information. Based on the identified probability thresholds, the prognosis is refined with additional CSF for only a proportion of patients. Discriminative performance and prediction accuracy in this independent cohort was defined on the combination of the proportion of patients with probabilities based on clinical information only and the proportion of patients with additional CSF biomarker testing.
We illustrate the practical use of the developed algorithm with two cases, one in whom additional CSF testing adds prognostic information, and one where it did not add prognostic information. For the reader to appreciate the clinical characteristics of MCI patients that were or were not selected for additional CSF testing, we will report on the clinical and demographic data for selected patients, patients below the lower probability threshold (not selected) and patients above the upper probability threshold (not selected). Table 1 presents the patient characteristics. Mean age of the MCI patients was 66 ± 8 years, 164 (41%) were female, and mean MMSE score was 27 ± 2 points. Overall, 189 (47%) patients progressed to dementia during 3 ± 2 years of follow-up.

Results
In Fig. 1, the stepwise approach from demographic information only to additional CSF testing is shown. This figure shows the prognostic discrimination and prognostic accuracy of the stepwise model in comparison with demographic information only (Harrell's C = 0.60, 3-year Brier score = 0.198) and demographics with additional CSF model when CSF results were included from all patients (Harrell's C = 0.70, 3-year Brier score = 0.186). The discriminative performance of the stepwise model started to increase if 10% of the patients surrounding the median received CSF testing. The discriminative performance of the stepwise model gradually further increased, until it performed similarly to the CSF model ( Fig. 1a) when 50% of the patients underwent CSF testing (Harrell's C = 0.67). Brier scores showed a similar pattern and were comparable with the CSF models if also 50% of the patients received CSF (3-year Brier score = 0.190, Fig. 1b). Figure 2 shows the stepwise approach from demographic and MRI information (Harrell's C = 0.61, 3-year Brier score = 0.195) to additional CSF testing (Harrell's C = 0.70, 3-year Brier score = 0.187). The stepwise model again started to increase if 10% of the patients received CSF testing and performed similarly to the CSF in all patients model (Fig. 2a) when 50% of the patients received CSF testing (Harrell's C = 0.67). Brier scores showed a similar, although more wiggly, pattern and was comparable with the full CSF model if also 50% of the patients received CSF (3-year Brier score = 0.190, Fig. 2b). shows the characteristics of patients that were and were not selected based on demographic and/or MRI information. Subsequently, we evaluated the identified probability thresholds in the BioFINDER study (supplemental Table 1). Patient characteristics of the BioFINDER study are reported in supplemental Table 2. Applying the identified probability thresholds by the stepwise approach in the BioFINDER study would select 51% for CSF biomarker testing based on demographic information.   Table 3).
To illustrate the practical implementation of our algorithm for additional CSF biomarker testing, we present two clinical cases. For patient A, based on age (70 years), sex (female), and MMSE score (28), the 3-year progression probability was estimated to be 49.7%. This probability falls within the identified probability of the 50% of patients surrounding the median, and therefore additional CSF testing would be recommended. Adding CSF information (Abeta = 1188, p-tau = 47) resulted in a far lower progression probability of 17.8%.
For patient B, both demographic and imaging information were available. Based on age (54 years), sex (male), MMSE (29), and HCV (sum; 7 cm 3 ), the 3-year progression probability was estimated to be 14.0%. This probability falls outside the identified probabilities of the 50% of patients surrounding the median based on demographic information and MRI. As the progression probability was already low, the algorithm does not recommend to add CSF testing. The progression probability of the ATN model (additional CSF testing; Abeta = 1349, p-tau = 44) for this patient was 9.2% and showed that CSF indeed did not meaningfully alter the estimated prognosis of this patient.

Discussion
We developed an algorithm to identify those MCI patients most likely to benefit from additional biomarker testing. We showed that CSF biomarker testing adds prognostic value to clinical information in half of the MCI patients. The findings were replicated in an independent cohort. As such, we achieved a CSF saving recommendation without reducing prognostic accuracy.
In the decision to perform additional diagnostic testing, it is important to specify to what end a diagnostic test is performed, e.g., to identify or exclude Alzheimer's disease (AD) pathology, predict clinical progression, change disease management, and/or improve well-being. The BIOMARKAPD project, a multidisciplinary working group, ranked these clinical questions on importance, and showed that CSF biomarkers are particularly useful to identify AD pathology and to predict progression to AD dementia in MCI patients [34]. Their recommendations are similar to those of the appropriate use criteria for CSF [4]. Both advise on CSF testing in all MCI patients. However, these recommendations are based on studies that investigated the additional value of CSF in terms of diagnostic or prognostic accuracy on a group level. In such studies, CSF is tested in all (MCI) patients and provides no information on the usefulness in specific patients. Moreover, the appropriate use criteria fairly state that a comprehensive clinical evaluation should precede the use of CSF biomarkers [4]. Clinicians should then determine, based on the available information, in which patients' CSF biomarkers contribute to the diagnosis and clinical decision making. Such statements in the appropriate use criteria, however, are hard to operationalize for clinicians, especially in predementia stages.
The current study provides clinicians with an easy-touse algorithm that uses readily available information (i.e., age, sex, MMSE, and hippocampal volume if available) to identify MCI patients for CSF biomarker measurement. We took as a starting point progression probabilities based on basic clinical information only. By identifying the range of progression probabilities close to the progression prevalence in the population, where CSF is likely to add prognostic value, we allow the clinician to make an informed decision on performing biomarker testing. The clinician could also use this information to inform the patient before embarking on biomarker testing and manage expectations about potential outcomes. The communication of considerations to perform or not perform a diagnostic test was given high priority in a recent Delphi consensus study among clinicians, patients, and caregivers [35]. The BIOMAKAPD workgroup also acknowledges the importance of these considerations; they recommend that "in the case of positive biomarkers a personal follow-up plan should be offered and appropriate support should be initiated in the case of symptom progression". And "in the case of negative AD biomarkers, an intensive follow-up plan may not be necessary". Although this mentions implications for both possible outcomes, it is still in general terms and does not take available clinical information into account.
In the search for practical guidelines on which patient to test, several previous studies developed prediction models for amyloid positivity. Although these studies differ in their methodological details, they all focus on only one of the pathological hallmarks of Alzheimer's disease as tauopathy and neurodegeneration are not considered [36,37]. Moreover, most of these studies compare patients with AD dementia with controls and cannot be generalized to the MCI population. Finally, these algorithms identify individuals most likely to benefit from additional testing to identify amyloid positivity-most relevant in a trial setting, while in clinical settings, the clinical outcome, i.e., progression to (any type of) dementia is more relevant. One previous study used a computer algorithm to select patients in whom CSF testing was likely to contribute to a more accurate differential diagnosis for different types of dementia [38].
In this study, CSF testing was recommended in 26% of the cases. However, MCI patients were not included in this study. In the current study, we extended on the available literature with a keen eye for the needs in clinical practice by providing an algorithm to select MCI patients in whom CSF testing is most likely to contribute to a more accurate prognosis.
One of the strengths of the current study is that our algorithms make use of validated prognostic models to estimate the prognosis of each patient using available clinical information (patient characteristics and/or hippocampal atrophy). Moreover, we used measures that are easily available to the clinician, i.e., patient characteristics, the widely used MMSE score, and hippocampal volume. Although we described this stepwise approach for the decision to perform CSF testing in MCI patients, our approach has general applicability to investigate a stepwise approach from any two prognostic models. The novelty in our study is that we used a data-driven approach to define the proportion of patients that would benefit from additional biomarker testing, i.e., the performance of the stepwise approach should be significantly better than the clinical model and similar to the full (demographics, MRI and CSF) model. Similar approaches have been proposed for the classification of cancer samples by means of high-dimensional genomic markers [39] With our full model we have a measure for amyloid (A), tauopahty (T), and neurodegeneration (N) and thus align with the ATN criteria reported by the NIA-AA [1]. Lastly, we validated our stepwise approach in an independent cohort. The success of this validation may have resulted from the fact that BioFINDER patients had a similar risk profile compared to the MCI patients from Amsterdam, as similar diagnostic guidelines for MCI were used in both cohorts. The usefulness of this stepwise approach in a population with a different composition of risk profiles should be a topic for further research.

Limitations
Among the limitations is that we were unable to construct a stepwise approach from the demographic model to the MRI model, as the demographic and MRI model performed similarly in our sample (data not shown). Although the addition of MRI does not result in a more accurate prognosis in MCI patients, performing MRI or CT is still valuable to exclude other (reversible) causes for cognitive impairment. Other diagnostic tests, like amyloid-PET, were not part of the current study. In a future study, we aim to apply the same approach to amyloid-PET. Based on previous research on prognostic models in amyloid-PET, we expect similar results as reported here [40]. Another direction for further research is the definition of the prognostic accuracy measure. In this paper, we have chosen two well-established measures of model performance, i.e., Harrell's C and the Brier score. Harrell's C has the limitation that it is only a discriminative measure which may select models that are poorly calibrated to the actual data. The Brier score is also a measure of calibration, but the quadratic distance used for measuring accuracy may not be the most appropriate measure in clinical decision making.

Conclusion
In conclusion, we showed that by performing CSF testing in 50% of the MCI patients the same prognostic accuracy is reached compared to testing all patients. Our algorithm uses prognostic models without CSF data to identify those patients most likely to benefit from CSF testing. This has important implications with respect to cost-efficient use of CSF testing. Furthermore, this approach also aids clinicians to set appropriate expectations before diagnostic testing.
Additional file 1: Supplemental Table 1. Probability thresholds for proportion of patients receiving additional CSF testing. Supplemental Table 2. BioFINDER patient characteristics. Supplemental Table 3. Prognostic discrimination and prognostic accuracy in the BioFINDER study.