Reference tissue normalization in longitudinal 18F-florbetapir positron emission tomography of late mild cognitive impairment

Background Semiquantitative methods such as the standardized uptake value ratio (SUVR) require normalization of the radiotracer activity to a reference tissue to monitor changes in the accumulation of amyloid-β (Aβ) plaques measured with positron emission tomography (PET). The objective of this study was to evaluate the effect of reference tissue normalization in a test–retest 18F-florbetapir SUVR study using cerebellar gray matter, white matter (two different segmentation masks), brainstem, and corpus callosum as reference regions. Methods We calculated the correlation between 18F-florbetapir PET and concurrent cerebrospinal fluid (CSF) Aβ1–42 levels in a late mild cognitive impairment cohort with longitudinal PET and CSF data over the course of 2 years. In addition to conventional SUVR analysis using mean and median values of normalized brain radiotracer activity, we investigated a new image analysis technique—the weighted two-point correlation function (wS2)—to capture potentially more subtle changes in Aβ-PET data. Results Compared with the SUVRs normalized to cerebellar gray matter, all cerebral-to-white matter normalization schemes resulted in a higher inverse correlation between PET and CSF Aβ1–42, while the brainstem normalization gave the best results (high and most stable correlation). Compared with the SUVR mean and median values, the wS2 values were associated with the lowest coefficient of variation and highest inverse correlation to CSF Aβ1–42 levels across all time points and reference regions, including the cerebellar gray matter. Conclusions The selection of reference tissue for normalization and the choice of image analysis method can affect changes in cortical 18F-florbetapir uptake in longitudinal studies.


Background
Amyloid-β (Aβ) plaques and neurofibrillary tau tangles are known pathological features of Alzheimer's disease (AD) [1,2] that manifest years before the onset of clinical symptoms [3][4][5][6][7][8]. Aβ plaques are identified in vivo using brain positron emission tomography (PET) with several radiotracers, including 11 C-Pittsburgh Compound B ( 11 C-PiB) [9], 18 F-florbetapir [10], 18 F-FDDNP [11], 18 Fflorbetaben [12], and 18 F-flutemetamol [13]. The standardized uptake value ratio (SUVR) is a semiquantitative method frequently used in clinical trials of antiamyloid drugs to monitor the accumulation and progression of Aβ plaques and to assess the effects of antiamyloid drug therapy. The SUVR method is used in most large studies because it is easily calculated and does not require long dynamic scans or measurement of the arterial input function. Nevertheless, it requires normalization of regional PET activity to a reference tissue to account for nonspecific radiotracer binding. Because 11 C-PiB and 18 F-florbetapir target predominately the classic core and neuritic Aβ plaques, which are not evidenced in the cerebellum [14][15][16][17], whole cerebellum (or the cerebellar gray matter) is commonly used as a reference region [18,19]. However, recent research raises new concerns about the accuracy of the SUVR measures using cerebellar normalization. In particular, the variability observed in the longitudinal progression of SUVR values seems to be discrepant with the expected values on the basis of pathological and biological grounds.
In recent studies [20][21][22][23], researchers have examined the feasibility of alternative reference regions for amyloid-PET. Brendel and colleagues [20] used the discriminatory power between AD, mild cognitive impairment (MCI), and healthy control (HC) subject groups, as well as the magnitude and variability of temporal changes in 18 Fflorbetapir PET, to evaluate different reference tissue. Chen and colleagues [21] examined the strength of associations between 18 F-florbetapir PET increase and clinical decline in addition to means of tracking the magnitude and variability of longitudinal Aβ-PET changes in different subject groups. Landau and colleagues [22] stratified the following subject groups on the basis of their cerebrospinal fluid (CSF) Aβ 1-42 levels at baseline: (1) a control group that included healthy subjects with normal and stable CSF Aβ 1-42 levels and (2) a second group that included both cognitively healthy subjects and those with early amnestic MCI with abnormal CSF Aβ 1-42 levels at baseline. The study was designed to test if the cortical Aβ-PET levels in the HC group remained stable while they increased in the second group. All three of these studies incorporated static 18 F-florbetapir PET scans (summarized in Table 1). In another study, by Wong and colleagues [23], the distribution volume ratio in a dynamic 18 F-FDDNP PET scan was used to determine the discriminatory power between an HC group and the AD group. In all of these studies, researchers found that use of white matter normalization improved the accuracy of longitudinal Aβ-PET data more strongly than use of gray matter normalization.
The objective of our present work was to complement the previous research by the use of a new PET image analytical method as well as longitudinal data of both CSF Aβ 1-42 levels and 18 F-florbetapir images to identify which reference region normalization results in the optimal visit-to-visit correlation between these two biomarkers of AD pathology. The subjects in this study were those diagnosed with late mild cognitive impairment (LMCI) from the ADNI 2 phase of the Alzheimer's Disease Neuroimaging Initiative (ADNI) with stable CSF Aβ 1-42 levels at baseline and at 24-month follow-up; thus, longitudinal changes in 18 F-florbetapir PET were not expected to occur, which allows use of their PET images as a test-retest dataset to evaluate the effect of reference region normalization. All PET images are analyzed with the conventional SUVR mean and median measures and with a new PET image cluster analysis tool based on a weighted two-point correlation (wS 2 ). The wS 2 method is a statistical tool adopted from astronomy and materials science and can be used to detect specific changes in spatial patterns within Aβ-PET images that we refer to as increased clustering or flocculence. Our preliminary data [24] indicate the potential utility of this method for detecting longitudinal changes that are difficult to assess with conventional regional mean image values, which typically have large standard deviations.

Subject selection
Data from 21 ADNI subjects with LMCI were used in our study. We included all subjects with LMCI who had 18 F-florbetapir PET and T1-weighted MRI images at baseline and 24-month follow-up scans following the PET technical procedures of the ADNI 2 phase. We further limited our subject selection to patients with LMCI who had longitudinal CSF data obtained at time points close to their PET baseline and follow-up scans. The specific selection of the LMCI subject group from the ADNI 2 phase was based on their stable levels of CSF Aβ 1-42 , which allowed use of their corresponding longitudinal PET images as a test-retest dataset. While our selection criteria limited the number of available subjects, one of the main advantages of using the ADNI 2 data was the commonality of the image acquisition protocols, which ensured consistency of data within and between sites and thus reduced heterogeneity that would have otherwise added to the variability of both longitudinal and cross-sectional data. The biomarker datasheet containing the CSF Aβ 1-42 levels was downloaded from the ADNI archive. The dataset is named UPENN-CSF Biomarkers [ADNI GO/2] version 2013-10-31. Table 2 summarizes the demographic information of the subjects enrolled in this study. Both the baseline and follow-up Aβ 1-42 CSF values (measured as picograms per milliliter) matched the average ADNI values of the MCI cohort (baseline 165 ± 45 pg/ml, 24 months 161 ± 46 pg/ml). There was no significant change in CSF Aβ 1-42 levels between baseline and follow-up among these subjects. This was determined on the basis of the coefficient of variation of CSF values between the two time points, which was on average 3.34 % across our cohort. For comparison, the longitudinal within-laboratory coefficient of variation for CSF measures is typically 5-19 % [25]. In addition to the CSF values, our subjects' cognitive test scores, measured using the Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-cog) [26], were 18 ± 7 at baseline and 19 ± 10 at follow-up. The Clinical Dementia Rating scores at both baseline and 24 months were 0.5 for almost all subjects. The Mini Mental State Examination (MMSE) [27] scores were 28 ± 2 at baseline and 26 ± 3 at follow-up. To summarize the subjects' clinical status, we included the box plots of their ADNI composite memory score [28], which combines the Rey Auditory Verbal Learning Test, the Logical Memory Test of the Wechsler Memory Scale, the MMSE, and the ADAS-cog (Fig. 1).

Data acquisition, image reconstruction, and preprocessing
All patient data were acquired at participating ADNI sites. 18 F-florbetapir PET, together with concurrent T1-weighted MRI volumes at baseline and 24-months follow-up, were downloaded from the ADNI database. The detailed description of the acquisition protocol can also be found on the ADNI website (http://adni.loni.usc.edu/). According to the ADNI protocols, a 370-MBq bolus injection of radiotracer was administered. This was followed by a 20-minute continuous brain PET imaging session that began approximately 50 minutes after the injection. The images were reconstructed immediately after the 20-minute scan according to scanner-specific reconstruction protocols, each using different versions of a maximum likelihood algorithm, to assess the scan quality and potential presence of motion artifacts. All images were corrected for attenuation and scatter according to the scanner-specific protocols. Upon completion, the imaging data were uploaded to the data archive of the Laboratory of Neuro Imaging at the University of Southern California, where they were coregistered and averaged. These are the datasets used in this study.

Image analysis
18 F-florbetapir images of each subject were aligned to their concurrent T1-weighted MRI volume. Gray matter and white matter masks of the T1-weighted MRI volumes were segmented in each subject's native space using SPM12 software (Wellcome Trust Centre for Neuroimaging, London, UK). Two different thresholds were applied on the segmented white matter to generate two types of white matter masks. The 10 % white matter mask included white matter voxels that were adjacent to gray matter. These border voxels were removed in the 100 % white matter mask. The template-based regional masks from the cerebellar gray matter and brainstem were obtained from the SPM12 atlas (labels_Neuromorphometrics.nii) and deformed into the subject' s native space. Regional masks for the corpus callosum were drawn manually. This was done by importing the MRI volumes into Amide, a medical image display and data analysis tool [29], where the center slice of the sagittal view was used to draw a region of interest around the splenium of the corpus callosum. Figure 2 represents candidate reference regions overlaid on a subject's T1-weighted MRI scan. The cerebral brain gray matter PET signal was normalized with respect to each mask, and the SUVR mean and median values were calculated. In addition to the SUVR mean and median values, we also calculated the wS 2 of the florbetapir PET images. The wS 2 method is a statistical image analytical method commonly used in astronomy [30] and materials science [31]. With this method, we derived a quantitative parameter from PET images to characterize the heterogeneity of the Aβ-PET activity distribution, which we refer to as the clustering or flocculence. The wS 2 analysis was also implemented with normalized Aβ-PET images. However, unlike the regional mean and median values, changes in wS 2 more specifically reflect changes in the spatial patterns of activity. Thus, these changes are potentially less sensitive to minor temporal variations in the reference tissue activity (variations in normalization threshold). PET analysis using the wS 2 method also results in smaller standard errors and thus may be more suitable for detecting subtle changes due to the larger effect size. The theoretical framework of wS 2 is described in our previous work where this method was validated with 11 C-PiB PET data [24].
The calculation of wS 2 started with sampling 50,000 random voxel pairs located within the gray matter of the 18 F-florbetapir PET image volume. For each voxel pair (each sampling instance), a weighting factor was calculated as the product of two terms. The first term was the average value of the two voxels, and the second term incorporated the absolute difference between the two voxel values into an exponential term. The weighting factor of an instance is higher when the values of both voxels are high and these values are close to each other. All sampling instances were then binned by the intervoxel distances, and for a given distance r the sum of the weighting factors was divided by the total number of instances with distance r and plotted versus r to obtain a wS 2 between 0 and 10 mm. Both the slope and the wS 2 area under the curve (AUC) change with the increased activity and increased heterogeneity of the activity distribution within the brain. Figure 3 shows the wS 2 AUCs from two florbetapir PET images. The wS 2 AUC was used as the quantitative outcome of this analysis and was calculated together with the mean and median of the SUVR for all baseline and follow-up images. The coefficients of variation of SUVR mean and median, as well as wS 2 across different time points and normalization schemes, were calculated over all 21 participants. Spearman's rank correlation coefficient was calculated between the 18 F-florbetapir PET outcomes (SUVR mean and median and ws 2 ) and the CSF Aβ 1-42 at baseline and follow-up.

Results
We used five different normalization regions (Fig. 2) to evaluate the correlation between amyloid-PET and CSF measures in a test-retest study. This association is graphically illustrated for all subjects at baseline and follow-up in Fig. 4 (cerebellar gray matter), Fig. 5 (10 % white matter), Fig. 6 (100 % white matter), Fig. 7 (brainstem), and Fig. 8 (corpus callosum). The medium and mean SUVR values and the wS 2 AUC were plotted (x-axis) versus the CSF Aβ 1-42 (y-axis). For each subject, the baseline marker (black) was connected via a line to the follow-up marker (red) to show each subject's individual change. Qualitatively, the scatterplot of SUVR mean and median values versus CSF Aβ 1-42 showed the lowest linear association between the two biomarkers when the cerebellar gray matter was selected as the reference region ( Fig. 4a and  b). With cerebellar normalization, the global mean and median SUVR values were between 1.1 and 2.0. The CSF Aβ 1-42 of brains with mean and median SUVR less than 1.5 seemed to remain clustered around 200 pg/ml, whereas SUVR mean and median values greater than 1.5 were associated with CSF Aβ 1-42 values around 125 pg/ml. The scatterplots of the wS 2 outcomes showed a more linear association with CSF Aβ 1-42 for all normalization schemes including the cerebellar gray matter (Fig. 4.C). This association was quantitatively evaluated by using Spearman's rank correlation coefficient (Fig. 9, Table 3) between the two biomarkers at both baseline (black bar) and follow-up (red bar). While the correlation was statistically significant for all normalization schemes, time points and methods of analysis, it was modest (~0.5) when cerebellar gray matter was selected as reference tissue and the SUVR mean and median values were calculated for PET analysis. The brainstem normalization resulted in the highest and most stable (lowest variability) Spearman's rank correlation values (~0.8) across both time points and all three methods of analysis. The coefficient of variation across all time points and normalization schemes was 0.10 for wS 2 method, 0.14 for SUVR mean and 0.13 for SUVR median.

Discussion
The results of this study show that analysis of 18 F-florbetapir PET data normalized to white matter reference regions results in a higher inverse correlation to CSF Aβ  and that this correlation exhibits less variability over time compared with 18 F-florbetapir PET data that are normalized to cerebellar gray matter ( Table 3, Fig. 9). These findings are in agreement with recent studies [20][21][22][23] in which researchers investigated the effect of reference tissue normalization using a  significantly larger number of ADNI subjects. This good agreement despite a smaller cohort in our study could be partially attributed to our subject selection, which consisted of ADNI2 patients with LMCI. As described in the Methods section, the image acquisition protocols of ADNI 2 were designed to ensure consistency of data within and between sites. All our subjects had 18 F-florbetapir PET scans at baseline and 24-month follow-up using the same (within-subject) scanner and the same image reconstruction and correction methods. These factors may have helped to reduce potential heterogeneities within this cohort that would otherwise have added to variability in both longitudinal and cross-sectional data.
Another advantage of ADNI 2 is the availability of concurrent CSF Aβ 1-42 at both baseline and 24-month follow-up time points, which allowed us to use them as a reference method to correlate with PET data at two different time points. On the basis of their stable CSF Aβ 1-42 , the brain amyloid levels of these subjects were not expected to change between baseline and the 24-month follow-up PET scans, thus making the 18 F-florbetapir PET images from this cohort an appropriate dataset for test-retest variability assessment of reference region normalization. The observed stable CSF Aβ 1-42 was not unexpected for subjects with LMCI, because it is known that the biomarkers of amyloid deposition approach a plateau by the onset time of LMCI and clinical AD [30].
Using a cohort with stable CSF Aβ 1-42 , our objective was to find a reference tissue that would give the highest and most stability (lowest variability) in Spearman's rank correlation between these two biomarkers calculated at two time points. While all white matter-normalized SUVRs indicated higher correlation to CSF measures than the cerebellar normalization, the brainstem normalization gave the best results among the white matter regions despite its location at the edge of the PET scanner field of view (FOV). The location of the cerebellum was suspected Fig. 9 Spearman's rank correlation between cerebrospinal fluid (CSF) amyloid-β 1-42 (Aβ 1-42 ) and 18 F-florbetapir standardized uptake value ratio (SUVR) median and mean and weighted two-point correlation function (wS 2 ) measures at baseline (black bars) and 24-month follow-up (red bars) for five different reference tissue normalization schemes to be the main reason for variability observed in the previous studies [21,22]. Due to their location, both brainstem and cerebellum are subject to increased scatter and decreased geometric sensitivity. However, PET data undergo rigorous attenuation, scatter, and normalization corrections to ensure uniformity within the FOV. Also, given that in our study the correlation values for cerebellar normalization were at their lowest levels for both baseline and follow-up time points, other factors, such as biological effects, could be more relevant than scanner-related physical effects. The connecting lines in Fig. 4 show that the within-subject differences between baseline and follow-up PET data (mainly intermediate SUVR mean and median values) were larger than all other normalization schemes (Figs. 5, 6, 7 and 8). In these figures, it is also apparent that the association between all three PET analytical methods and CSF measures become increasingly nonlinear as the PET values increase. This nonlinearity effect was most prominent when cerebellar gray matter was used as reference tissue (Fig. 4), where the CSF data of SUVR mean and median values below 1.5 were clustered around 200 pg/ml and the CSF data of SUVR mean and median values above 1.5 corresponded to CSF measures that remained around 125 pg/ml. All other white matter normalization schemes resulted in slightly more linear associations with CSF measures, in particular for intermediate PET values.
We included CSF because Aβ accumulation has been hypothesized to result from an imbalance between Aβ production and clearance [2,[32][33][34][35]. In particular, the impairment of clearance mechanisms seems to be the main cause of Aβ accumulation in sporadic or late-onset forms of AD [35], which account for the majority of patients with AD. In several previous studies, researchers have observed a relationship between cortical amyloid tracer binding and levels of CSF Aβ 1-42 using 11 C-PiB [36] and 18 F-florbetapir [37]. These studies, which were based on cerebellar normalization, showed that the CSF levels decreased with increased radiotracer uptake but reached a plateau at higher SUVR values. We made a similar observation with cerebellum normalization (Fig. 4a  and b). Other reference region normalizations, the brainstem in particular, resulted in more linear relationships across a wide range of cortical radiotracer uptake values at both baseline and follow-up. We emphasize on the importance of this observation because the axial location of the cerebellum (increased scatter and attenuation) accounted for the observed longitudinal variabilities in previous studies. However, scanner-related effects would affect the PET-CSF association within the whole spectrum of SUVR values. Also, both the brainstem and the cerebellum are equally subject to increased scatter and decreased geometric sensitivity. Our approach might indicate that the variability associated with the reference region normalization may more likely be related to biological factors than to scanner-related effects.
Four different white matter masks (white matter 10 %, white matter 100 %, brainstem, and splenium of corpus callosum) were applied. While the white matter 10 % included the white matter regions that shared borders with gray matter, these regions were removed in the 100 % white matter mask. Correlation values from these two white matter masks and the corpus callosum were similar.
The wS 2 technique was used as an additional method complementary to the conventional SUVR analysis that is performed by calculating regional mean and median SUVR values. Compared with the SUVR mean and median values, the wS 2 metric was associated with the highest average Spearman's rank correlation across all time points and reference regions, including the cerebellar gray matter. Given that the wS 2 metric is based on changes in image spatial patterns, we expected that this method would be slightly less sensitive to minor temporal variations in reference region radiotracer activity, which would cause variations in normalization thresholds. The wS 2 method evaluates associations between voxel values at different distances. These associations remain preserved, to some extent, even when the normalization threshold varies.
To date, we have applied the wS 2 analysis with two different radiotracers ( 11 C-PiB and 18 F-florbetapir) and have been able to show consistent results. Using a statistical analysis, we evaluated the effect of injected dose (as a surrogate for image noise) and the region size on the wS 2 outcomes and made a comparison with SUVR mean and median values. We obtained high and stable correlations between CSF Aβ levels and wS 2 outcomes with both radiotracers. Further validations would require a full quantitative analysis using kinetic modeling and dynamic acquisitions. Our main future objective is to test the wS 2 methodology with dynamic PET scans and list-mode data acquisition to investigate how different image acquisition (starting time point and duration) and reconstruction parameters (number of iterations and noise regularization) can change the image spatial patterns and subsequently the wS 2 outcomes. Image preprocessing is another important factor. Spatial resolutions of human PET scanners range from greater than 2.5-mm full-width half-maximum (FWHM) in some research scanners to greater than 7-mm FWHM in many commonly used clinical PET systems [38][39][40]. Additional preprocessing steps, such as image smoothing, further reduce the image resolution from 7-to 12-mm FWHM. For example, most reported ADNI analyses use level 4 preprocessed imaging data, which are smoothed to a uniform isotropic resolution of 8-mm FWHM [39]. The smoothing process is beneficial for cross-sectional comparisons and for qualitative visual reads by clinicians, due to the improved uniformity. However, it has a disadvantage in that potentially important high-resolution spatial patterns are smoothed away [40]. The spatial smoothing of within-subject longitudinal can reduce the effect size [41]. We are the first group, to our knowledge, to propose a method designed to improve understanding of the nature of nonuniform spatial activity patterns that explain the impact of spatial smoothing on longitudinal changes.

Conclusions
The selection of reference tissue for normalization of 18 F-florbetapir PET images as well as the image analysis method can modify the quantitative outcomes in longitudinal studies. Understanding factors that contribute to temporal variations of reference region radiotracer uptake merits further investigation.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions SS is the study's principal investigator and was the main contributor to the study design and analysis as well as drafting of the manuscript. JWM carried out the SUVR calculations, helped with ADNI subject search and CSF data collection, and revised the manuscript. SLB contributed to the ideas and objectives of this study and revised the manuscript. HK helped with the statistical analysis and revised the manuscript. ABB contributed to the design of the study and revised the manuscript. HEG and DOC addressed the clinical objectives and revised the manuscript. WRR helped with the MRI analysis and revised the manuscript. BPR participated in the study design and its coordination and helped to draft and revise the manuscript. All authors read and approved the final manuscript.