Strategies to reduce sample sizes in Alzheimer’s disease primary and secondary prevention trials using longitudinal amyloid PET imaging

Background Detecting subtle-to-moderate biomarker changes such as those in amyloid PET imaging becomes increasingly relevant in the context of primary and secondary prevention of Alzheimer’s disease (AD). This work aimed to determine if and when distribution volume ratio (DVR; derived from dynamic imaging) and regional quantitative values could improve statistical power in AD prevention trials. Methods Baseline and annualized % change in [11C]PIB SUVR and DVR were computed for a global (cortical) and regional (early) composite from scans of 237 cognitively unimpaired subjects from the OASIS-3 database (www.oasis-brains.org). Bland-Altman and correlation analyses were used to assess the relationship between SUVR and DVR. General linear models and linear mixed effects models were used to determine effects of age, sex, and APOE-ε4 carriership on baseline and longitudinal amyloid burden. Finally, differences in statistical power of SUVR and DVR (cortical or early composite) were assessed considering three anti-amyloid trial scenarios: secondary prevention trials including subjects with (1) intermediate-to-high (Centiloid > 20.1), or (2) intermediate (20.1 < Centiloid ≤ 49.4) amyloid burden, and (3) a primary prevention trial focusing on subjects with low amyloid burden (Centiloid ≤ 20.1). Trial scenarios were set to detect 20% reduction in accumulation rates across the whole population and in APOE-ε4 carriers only. Results Although highly correlated to DVR (ρ = .96), cortical SUVR overestimated DVR cross-sectionally and in annual % change. In secondary prevention trials, DVR required 143 subjects per arm, compared with 176 for SUVR. Both restricting inclusion to individuals with intermediate amyloid burden levels or to APOE-ε4 carriers alone further reduced sample sizes. For primary prevention, SUVR required less subjects per arm (n = 855) compared with DVR (n = 1508) and the early composite also provided considerable sample size reductions (n = 855 to n = 509 for SUVR, n = 1508 to n = 734 for DVR). Conclusion Sample sizes in AD secondary prevention trials can be reduced by the acquisition of dynamic PET scans and/or by restricting inclusion to subjects with intermediate amyloid burden or to APOE-ε4 carriers only. Using a targeted early composite only leads to reductions of sample size requirements in primary prevention trials. These findings support strategies to enable smaller Proof-of-Concept Phase II clinical trials to better streamline drug development. Supplementary Information The online version contains supplementary material available at 10.1186/s13195-021-00819-2.


(Continued from previous page)
Results: Although highly correlated to DVR (ρ = .96), cortical SUVR overestimated DVR cross-sectionally and in annual % change. In secondary prevention trials, DVR required 143 subjects per arm, compared with 176 for SUVR. Both restricting inclusion to individuals with intermediate amyloid burden levels or to APOE-ε4 carriers alone further reduced sample sizes. For primary prevention, SUVR required less subjects per arm (n = 855) compared with DVR (n = 1508) and the early composite also provided considerable sample size reductions (n = 855 to n = 509 for SUVR, n = 1508 to n = 734 for DVR).
Conclusion: Sample sizes in AD secondary prevention trials can be reduced by the acquisition of dynamic PET scans and/or by restricting inclusion to subjects with intermediate amyloid burden or to APOE-ε4 carriers only. Using a targeted early composite only leads to reductions of sample size requirements in primary prevention trials. These findings support strategies to enable smaller Proof-of-Concept Phase II clinical trials to better streamline drug development.

Background
With the recently established biological definition of Alzheimer's disease (AD) [1] and the increased availability of (imaging) biomarkers, the research community is now well-equipped to study this disease from its earliest pathological changes to later-stage clinical presentations of cognitive impairment [2]. Especially in the context of much needed treatment and prevention strategies, this research framework can be extremely valuable in accurately identifying individuals in the AD continuum, who might benefit from disease-modifying therapies.
With varying degrees of pathological confirmation, recent years have seen many disease-modifying therapies that failed to meet primary endpoints and impact cognitive functioning [3]. In fact, despite promising signals observed in a number of anti-amyloid clinical trials [4][5][6][7][8], the lack of downstream effects on cognition posed important questions on the validity of the widely accepted amyloid cascade hypothesis and highlighted our (still) limited understanding of the mechanisms involved in this disease. Nonetheless, recent results such as those from the aducanumab [7,9,10] or BAN2410 [11] trials have shown promising signals for anti-amyloid therapies and in fact have encouraged the development of earlier preventive Phase 3 trials focusing on subjects with preclinical AD such as the AHEAD 3-45 Study [12]. As a result, this shift to prevention in earlier stages of the disease and the (possible) future need for pathological confirmation pre-treatment may increase the use of biomarkers such as amyloid positron emission tomography (PET) imaging for both screening and measurement of treatment effects. However, a marked discrepancy in duration between most short-term studies and the long-term pathological processes such as Aβ plaque accumulation [13,14] may result in the need to detect subtle-to-moderate biomarker changes [15].
When focusing on the early stages of AD with amyloid PET, observed changes in Aβ burden are mostly focal [16][17][18], and it may be difficult to detect these changes with sufficient statistical power, challenging standard analytical approaches, and the traditional use of a global measure of amyloid burden [19,20]. Recent work suggests that regional amyloid PET assessments can improve early detection of pathology [16,21,22] and achieve increased power in clinical trials [23]. In addition, several PET studies have investigated potential methodological improvements to increase statistical power in longitudinal settings and better discriminate sub-populations cross-sectionally [24]. These studies generally focused on improving technical factors affecting image quality such as partial volume effects [25] or on modeling and pre-processing choices impacting measurement stability, such as the choice of reference region [26][27][28]. However, since the vast majority of PET studies performs static acquisitions, these improvements remain mostly limited to the use of the standard uptake value ratio (SUVR) metric. Although easily available from short static scans, SUVR is a semi-quantitative and biased proxy of the specific amyloid burden as measured by binding potential (BP ND ) or distribution volume ratio (DVR) [29,30], which are available only from dynamic scans. Specifically, SUVR is known to suffer from technical and physiological sources of bias such as inconsistent scanning window and changes in cerebral blood flow [30,31]. However, traditional dynamic acquisitions can significantly increase the duration and cost of studies; therefore, compromises have been proposed, such as the collection of early frames in addition to the standard late-uptake image acquisition [32]. In fact, this early frame collection not only allows for the determination of DVR, but provides an additional parameter (R 1 ) that can serve as a proxy for cerebral blood flow, another important marker of disease in AD [33,34].
Considering current and future research needs, this study aims to determine if and when dynamic imaging and targeted regional quantification could improve statistical power in primary and secondary prevention trials using longitudinal amyloid PET imaging. For that purpose, we estimated the number of participants per arm needed in three hypothetical trial scenarios aiming to reduce amyloid accumulation rates by at least 20%: (1) one in subjects with low amyloid burden for primary prevention, and two for secondary prevention, either (2) including all subjects with abnormal amyloid levels (intermediate-to-high) or (3) focusing on those at the earliest stages of pathology (intermediate levels) . We compared the sample sizes required when using SUVR and DVR as amyloid load metric in both the whole population as well in trial scenarios only recruiting APOE-ε4 carriers.

Data sets
This work included two separate datasets: the first was used for main analyses, and the second for calculating test-retest variability for [ 11 C]PIB SUVR and DVR.
For the first dataset, tabulated PET data were obtained from the Open Access Series of Imaging Studies (OASIS-3) dataset, which is a longitudinal neuroimaging, clinical, cognitive, and biomarker dataset for normal aging and Alzheimer's disease (www.oasis-brains.org). This dataset is a retrospective compilation of data collected across several ongoing projects through the Washington University of Saint Louis Knight Alzheimer's Disease Research Center (ADRC) over the course of 30 years [35]. A total of 237 subjects were selected based on (1) being classified as cognitively unimpaired and (2) having at least two dynamic [ 11 C]PIB PET scans with a minimum of 1 year between sessions available.
For the second dataset, eleven subjects (4 cognitively unimpaired, 1 mild cognitive impaired, and 6 with AD dementia) were selected from a previously reported testretest (TRT) study at the Amsterdam University Medical Center location VUmc [36]. Test and retest scans were performed within a one week interval.

Image acquisition and processing
A brief description of data collection and standard imaging processing pipelines for each dataset can be found below.
OASIS-3 60 min dynamic [ 11 C]PIB PET images were acquired starting at the intravenous administration of approximately 12 mCi of radiotracer. Data was collected in 3D mode on a Siemens/CTI EXACT HR+ scanner or a Biograph 40 PET/CT scanner. Accompanying anatomical T1-weighted MPRAGE MR scans were acquired using either a Siemens 1.5 of 3T scanner. Image processing was performed with a local processing pipeline (PUP; https://github.com/ysu001/ PUP), described in detail previously [37]. In short, the standard FreeSurfer (v5.3; Martinos Center for Biomedical Imaging, Charlestown, Massachusetts, USA; https://surfer.nmr.mgh.harvard.edu/fswiki) based PUP processing includes a scanner resolution harmonization filter [38], inter-frame motion correction, PET-MR registration, and regional time-activity curves extraction for all regions from the Desikan-Killiany atlas (DK) [39]. Using the cerebellar cortex as the reference region, reference Logan graphical analysis (RLogan) [40] was used to determine DVR with t* set to 30 min post-injection (p.i.). In parallel, SUVR was extracted for the same time-window of 30-60 min p.i.
For the TRT study, 90 min dynamic [ 11 C]PIB PET scans were performed on a Siemens ECAT EXACT HR+ scanner and a structural T1-weighted MR scan on a 1.5 T Siemens Sonata scanner. First, structural T1-weighted MR images were co-registered to the PET scan using Vinci software (Max Planck Institute for Neurological Research, Cologne, Germany) and PVE-lab software was used to extract the cerebellar cortex time-activity curve based on the Hammers atlas [41,42]. Next, both DVR (RLogan) and SUVR were calculated from 30 to 60 min p.i. in order to compare results with those from the OASIS-3 dataset and finally normalized to the cerebellar cortex using PPET software [43]. These parametric images were then warped into MNI space using SPM12 and the DK atlas was used to extract regional SUVR and DVR values.
Both global and regional analyses were performed on the SUVR and DVR data. A global measure of amyloid burden was determined based on a "cortical composite" created from grey-matter FreeSurferdefined frontal, parietal, temporal, and precuneus regions [37]. In addition, an "early composite" was defined from three grey-matter DK regions, namely the isthmus cingulate, precuneus, and lateral orbitofrontal cortices. These regions were chosen based on literature for consistently displaying increased amyloid burden in early disease stages, as well as higher rates of accumulation compared with cortical composites [16][17][18][19]44]. Finally, corresponding and previously validated Centiloid (CL) values were also available for comparison in the OASIS-3 dataset [26].

Levels of β-amyloid burden
Three different levels of amyloid burden were defined based on CL cutoffs available from literature and validated against pathology [45]. Low amyloid burden was defined as CL values below 20.1, a threshold showing the highest accuracy in detecting moderate or frequent plaque density. In contrast, high amyloid burden was defined as CL values above 49.4, the threshold found to identify intermediate or high likelihood of Alzheimer's disease according to NIA-AA 2012 criteria [46]. Finally, intermediate levels were those with 20.1 < CL ≤ 49.4.

Amyloid accumulation
In order to account for differences in number of scans and interval between visits, a linear mixed effects model (LME) with random intercepts and random slopes was used to determine annualized rates of Aβ accumulation for every metric (SUVR and DVR) in the OASIS-3 dataset. To facilitate interpretability when reporting results, these were also normalized to baseline Aβ levels and will be reported as annualized % change.
Next, the TRT variability of each quantitative metric derived from the TRT dataset was used as a cutoff to determine the proportion of subjects to be considered as "accumulators," i.e., those with annualized % change above TRT variability. Relative TRT variability was calculated for all subjects from the TRT dataset (n = 11) and for cognitively unimpaired subjects only (n = 4), according to Eq. 1, where the estimate of amyloid burden (DVR or SUVR) of the test scan is denoted as T and for the retest scan as R.

Statistical analysis
All statistical analysis were performed using R Statistical Software (version 4.0.2; R Foundation for Statistical Computing, Vienna, Austria). Results are reported as mean ± standard deviation (μ ± SD) or median (M) and interquartile range (IQR), as appropriate. In all analyses, DVR was considered the reference metric.
To assess the relationship between cortical SUVR and DVR at baseline and longitudinally, Bland-Altman plots, correlation analyses, and paired t tests (or Wilcoxon signed-rank test) were used. In addition, paired t tests (or Wilcoxon signed-rank test) were also used to assess differences between a cortical composite and an early composite in the estimation of amyloid burden and accumulation rates.
To assess the relationship between baseline amyloid burden and longitudinal amyloid accumulation, a linear, a quadratic and a natural cubic spline model with 1 knot were tested, and the optimal model was determined based on the Akaike information criteria (AIC).
Finally, effects of age, APOE-ε4 carriership (presence of at least 1 ε4 allele), and sex on baseline amyloid burden were assessed by a general linear model (GLM). Similarly, a linear mixed effects model (LME) was used to determine the effect of the same variables on amyloid accumulation, accounting for baseline amyloid burden.
The analyses above were performed in order to determine the generalizability of the OASIS-3 dataset with respect to other cohorts, such that the results of the sample size calculations can be contextualized appropriately.

Sample size calculations
Using the LME estimates for annualized accumulation rates and respective standard deviations, the sampsizepwr function in Matlab (1-β = 80% power and a twotailed t test type-I error of α = 0.05) was used to determine sample sizes required to detect differences in accumulation rates in three hypothetical 12-month placebocontrolled randomized anti-amyloid clinical trials. The trial designs assumed participants undergo a PET scan at baseline and another at the completion of the trial. These were computed separately for SUVR and DVR, using the cortical composite and the early composite, both across the whole population and restricted to APOE-ε4 carriers only.
The tested trial scenarios were the following: 1) A secondary prevention trial aiming to detect a 20% reduction in β-amyloid accumulation rates in individuals with intermediate-to-high amyloid burden (CL > 20.1) at baseline; 2) An earlier secondary prevention trial aiming to detect a 20% reduction in β-amyloid accumulation rates focusing in individuals with intermediate amyloid burden (20.1 < CL ≤ 49.4) at baseline; 3) A primary prevention trial aiming to detect a 20% reduction in in β-amyloid accumulation rates in individuals with low amyloid burden (CL ≤ 20.1) at baseline.

Results
On average, OASIS-3 subjects underwent 2.5 ± 0.6 scans [range [2][3][4][5], with an average of 4.8 ± 2.1 years between the first and the last scan [range 1-9.6]. The majority of subjects were female (65.0%), 32.9% of them were APOE-ε4 carriers, and the mean age at the time of the first PET session was 65.3 ± 9.4 years. Complete OASIS-3 cohort demographics are shown in Table 1.
As expected, both baseline and accumulation rates with SUVR and DVR were significantly higher when Fig. 1 Relationship between SUVR and DVR. On the top panel, a scatterplot between baseline cortical SUVR and DVR across all subjects, with a solid identity line as reference (a), and a Bland-Altman plot displaying a linear relationship between SUVR bias and underlying amyloid burden (b). On the bottom panel, a scatterplot between annualized % cortical SUVR and DVR across all subjects, with a solid identity line as reference (c), and a Bland-Altman plot displaying a linear relationship between bias in annualized % cortical SUVR and underlying accumulation rates, with a dotted line representing a linear regression through the data points (d) using the early composite compared to the cortical composite (Table 1).

TRT and longitudinal amyloid accumulation
In order to assess the proportion of OASIS-3 participants with accumulation rates beyond TRT variability, we determined cutoffs for accumulation based on a separate local TRT dataset.
Using TRT from cognitively unimpaired subjects as our main cutoff for accumulation (due to cohort comparability) and a cortical composite for quantification, 81 (34.2%) individuals were classified as accumulators using DVR compared with 45 (23.6%) using SUVR (Fig. 2b). A total of 25 subjects were accumulators with DVR but not SUVR; 17 of them belonging to the low, 3 to the intermediate, and 5 to the high amyloid burden group (Table 1). Similarly, using the early composite for quantification and TRT cutoff, SUVR analyses classified 8 (3.4%) of subjects as accumulators compared to 39 (16.5%) when using DVR. In this case, 31 subjects were accumulators with DVR but not with SUVR, 10 of which were from the low, 15 from intermediate, and 6 from the high amyloid burden group. Table 2 summarizes the required sample sizes for three hypothetical trial scenarios, considering different choices with respect to acquisition protocol (static/SUVR or dynamic/DVR), methodological (cortical composite or early composite), and inclusion criteria (whole population or APOE-ε4 carriers only).

Sample sizes in longitudinal studies
For secondary prevention trials aiming to detect a 20% reduction in β-amyloid accumulation rates, the sample sizes required are consistently lower when using DVR compared to SUVR (Table 2), likely because the smaller standard deviation and better TRT observed with DVR outweighs its lower average rate of accumulation (Table 1). In addition, including only APOE-ε4 carriers provided considerable reduction in the required sample sizes (whole population: N SUVR = 176, N DVR = 143, APOE-ε4 carriers only: N SUVR = 116, N DVR = 83), for either region of interest chosen for analysis. Further, if this secondary prevention trial included only individuals at an earlier stage of the disease (i.e., those with intermediate amyloid burden and thus more likely to have higher accumulation rates), a 4-fold reduction in required sample sizes (N SUVR = 44, N DVR = 39) can be achieved compared to including subjects from the general population (N SUVR = 176, N DVR = 143). In both secondary prevention scenarios, the use of an early composite did not reduce the required sample sizes.
Finally, a primary prevention trial required the largest sample sizes overall as expected, and the use of an early composite reduced the number of subjects needed to detect the desired effect by~40-50%, in case of both SUVR (N CORTICAL = 855, N EARLY = 509) and DVR (N COR-TICAL = 1508, N EARLY = 734). Similarly, restricting the trial to APOE-ε4 carriers provided approximately~20% reductions in sample size requirements with either acquisition protocol. However, in this scenario, the use of SUVR provided smaller sample size requirements than DVR (Fig. 3), which relates to its higher accumulation rates and similar standard deviation (Table 1).

Discussion
In this work, we observed that the smaller variability of DVR compared with SUVR results in smaller sample size requirements for anti-amyloid secondary prevention trials when using dynamic amyloid PET scans. In addition, focusing on individuals with intermediate levels of amyloid burden who are at the peak of accumulation provides a 4-fold reduction in sample sizes compared to traditional secondary prevention trials (where inclusion criteria includes amyloid-positive individuals regardless of the extent of pathology). As expected, primary prevention trials require larger sample sizes to achieve similar statistical power, but this can be mitigated by targeting inclusion criteria to APOE-ε4 carriers and/or by using an early composite region of interest.
First, the direct comparison between dynamic and static parameters in this work confirmed that SUVR largely overestimates DVR and that this bias is strongly dependent on the underlying levels of amyloid burden (Fig. 1a, b). In addition, this overestimation relates to the underlying radiotracer kinetics and can be further influenced by scan time, as well as known confounding effects such as changes in blood flow and tracer clearance [29,30]. Especially in the case of disease-modifying therapies, an intervention could affect cerebral blood flow and therefore falsely inflate treatment effects when measured by SUVR [31], challenging the interpretation of SUVR-based rates of amyloid accumulation. As a consequence, the results of our primary prevention trial scenario should be interpreted with caution, where the increased accumulation rates observed with SUVR seem to facilitate the detection of treatment effects compared to DVR, despite the increased variability (Tables 1 and  2). Especially in these early stages of disease where the underlying amyloid PET signal is low, the relatively large contribution of physiological and methodologically driven fluctuations in the PET signal can lead to Table 2 Sample size requirements per trial arm, for three hypothetical trial scenarios, comparing differences between using DVR/ SUVR, a cortical/early composite ROI, and restricting the inclusion to APOE-ε4 carriers or not misinterpreted results. This is of particular relevance when the tested intervention may impact cerebral blood flow.
In contrast, secondary prevention trials seem to benefit from the acquisition of dynamic scans, where consistent reductions in sample sizes are observed (Table 2). There, the overestimation of SUVR accumulation rates is less pronounced with respect to its increased variability, resulting in a direct improvement in statistical power when using DVR, a metric with overall lower TRT variability [36]. This finding is in line with a recent publication on tau tracer [ 18 F] flortaucipir, where the differences TRT variability between SUVR and BP ND also led to smaller sample size requirements when using the latter as quantitative metric [47]. Naturally, obtaining DVR estimates would imply the acquisition of dynamic scans, which can result in a non-negligible increase in patient discomfort, use of scanner time, and overall study cost. To our knowledge, the only available report on the willingness of participants to undergo a second dynamic scan indicates that, at least when using a dual time-window protocol, only 5% of them would consider dropping the study due to discomfort [48]. Further, considering average rates of €750 for a static scan and €1050 for a dynamic scan (available from the AMYPAD Consortium, data not shown), our results indicate that performing dynamic scans may not significantly impact study costs (DVR: N = 143, €150 k, SUVR: N = 176, €132 k). Therefore, while maintaining similar cost, the acquisition of dynamic scans can increase statistical power, provide additional biomarker information on cerebral blood flow [33,34], and expose less participants to radiation, an ethical consideration that should not be disregarded [29].
In addition to the increased statistical power of DVR, focusing subject selection in secondary prevention trials to individuals at the peak of amyloid accumulation (20.1 < CL ≤ 49.4) provided a 4-fold reduction in required sample sizes (Table 2). In fact, similar results have been reported by Guo and colleagues, who demonstrated that prevention trials must account for the differences in amyloid accumulation phases (Fig. 2a) by narrowing the range of amyloid burden in inclusion criteria range; otherwise, estimates of treatment effect can be significantly biased [49]. Importantly, the interval of amyloid burden used in our work captures the typical range of amyloid positivity cutoffs derived from visual assessment [45,[50][51][52], while the upper values around 49.4 CL mostly correspond to levels found in subjects with a clinical presentation of AD [45,53]. In addition, the range of amyloid burden used in this work for each of the secondary prevention trials are in line to with both the A3 (20-40 CL) and the A45 (CL > 40) trials, both of which target a similar population to the OASIS-3 dataset [11]. Together, our findings further stress the advantages of refining the range of amyloid burden in entry criteria and support the current and future design of smaller, Phase-II, Proof-of-Concept prevention trials in at-risk populations [54]. Of note, these considerations should be weighed against possible higher screening failure rates.
Interestingly, the secondary prevention trial designs tested in this work did not seem to benefit from the use of an early composite ROI. At this stage, the amyloid accumulation in a (global) cortical composite has reached similar rates as those observed in the early regions and has the advantage of larger volume and better count statistics (Table 1). This suggests that, already at the intermediate amyloid burden level, accumulation rates of other regions start to increase and contribute to the global signal. In line with our findings, a previous report described that at higher levels of amyloid burden, the set of regions with increased accumulation rates fall outside of the typical-AD topography [49]. In contrast, primary prevention trials seem to greatly benefit from the use of an early composite ROI, where we observed a~40-50% reduction in expected sample sizes using a ROI composed of precuneus, isthmus cingulate, and lateral orbitofrontal regions ( Table 2). These findings are corroborated by a recent report from Insel and colleagues using the Alzheimer's Disease Neuroimaging Initiative data-set [23]. There, authors showed a reduction of~62% in required sample sizes when using an early ROI composed of precuneus and posterior cingulate. Both early regions proposed by Insel's and our work, as well as the late ones described by Guo and colleagues are in excellent agreement with recently proposed amyloid burden staging systems [16,19,44]. Thus, these findings indicate that in order to significantly impact statistical power, the choice of regions for quantification must be informed by the disease stage of the target population.
Finally, we demonstrated that screening for risk factors such as age and APOE-ε4 carriership could further reduce sample size requirements. As expected, age was associated with higher baseline levels of amyloid burden. However, it was not predictive of accumulation rates, which reiterates this is a risk factor for amyloid pathology but does not directly influence the overall accumulation process, as previously suggested in a meta-analysis [13]. Similarly, APOE-ε4 carriership was more frequent in subjects with intermediate-to-high amyloid burden, and carriers were younger than their non-carrier counterparts (Table 1). In addition, carriership was only marginally associated with increased accumulation rates, similar to previous work [55,56], an effect which only reached significance for SUVR (likely due to the proportional bias of this metric which increases for higher levels of amyloid and accumulation rates, see Fig. 1b, d). Together, this suggests APOE-ε4 mainly impacts the onset of amyloid pathology rather than the speed of the subsequent accumulation process [57]. These results are in line with several previous reports, which indicate that even in cognitively unimpaired individuals, APOE genotype has a substantial effect on the age-related prevalence of AD pathology [13,58]. In our work, we find that both primary and secondary prevention trials can still significantly reduce required sample sizes when enrolling APOE-ε4 carriers alone, despite their younger age. Therefore, enrichment strategies in a general population could focus on older individuals, while specifically targeting APOE-ε4 carriers may allow for the inclusion of younger subjects, as these would already have an increased probability of being in the AD continuum. However, such a strategy may impact both screen failure and future labeling of the drug, restricting its prescription from the general population.
It is important to note that all results in this work relate to a fixed effect (20%) of reducing the accumulation rates in amyloid PET scans, which may seem disconnected from the level of amyloid removal observed in recent anti-amyloid immunotherapies [7,59]. Indeed, most anti-amyloid trials demonstrate such large reductions in amyloid burden that the effects can be appreciated even visually. Nonetheless, other interventions may have more subtle effects on amyloid burden, either directly or indirectly. Some examples would be BACE1 inhibitors [60], drugs with other targets which have downstream amyloid effects [61], or nonpharmacological therapies and multi-domain preventive trials such as those being tested in World-Wide FING ERS [62,63]. As such, 20% reduction of amyloid accumulation may be a relevant target to detect, especially in a short 1-year Proof-of-Concept study. Nonetheless, the overall sample size impacts of using SUVR/DVR, early/ cortical composites, or restricting inclusion criteria can also be observed for larger treatment effects (Supplementary Figure 1). Naturally, these differences become less relevant as the expected reductions become larger.
Methodological issues need to be considered when interpreting the findings of this study. First, while DVR is used as the standard of truth in this work, the chosen imaging window for analysis (30-60 min p.i.) and the use of RLogan could both have affected the results of the comparison between SUVR and DVR. Previous studies have indicated that, prior to the 40-50 min interval, [ 11 C]PIB SUV may still be rapidly changing and equilibrium is still not reached. Therefore, this earlier imaging window does not correspond to secular equilibrium conditions, which could have inflated possible flow effects in SUVR and affected RLogan estimates [64]. In addition, RLogan is known to underestimate true binding potential and suffer from noise-induced bias, while other methods such as SRTM2 and MRTM2 have been proposed as optimal for [ 11 C]PIB and might have produced higher accumulation rates with DVR [65]. It should also be noted that TRT values from a small single-center study may not translate to the data collected in OASIS-3. However, the differences between SUVR and DVR TRT reported in this work are in line with previous findings with the same tracer [66], as well as with other tracers [47]. Moreover, the TRT dataset analyzed in this work was used as supporting evidence for the superior statistical properties of dynamic PET scans, and the use of literature values would have resulted in equivalent results.

Limitations
Limitations include the single-tracer character of the study and the relatively limited availability of followup data with more than two time points. In addition, one must consider whether the population of OASIS-3 is representative of the primary/secondary prevention trial populations. First, the age range in this work might be too large, but the vast majority of subjects (71%) were between 60 and 85 years of age [11]. Of note, these results may not be comparable to other tracers, as the kinetics of [ 11 C]PIB are markedly faster than what is observed with, e.g., the commercially available F-18 tracers such as [ 18 F]flutemetamol and [ 18 F]florbetaben, which may display even larger biases between SUVR and DVR and therefore also larger differences in sample size requirement between the metrics. This remains to be confirmed and will be explored within the Amyloid Imaging to Prevent Alzheimer's Disease (AMYPAD) Consortium [67]. Finally, future work in a larger dataset may consider estimating the uncertainty around sample size estimates to better understand the generalizability of these results and relate them to changes in cognitive functioning, which remains the main outcome measure in most preventive trials to date.

Conclusion
Strategies to improve statistical power differ between secondary and primary AD prevention trials. First, the acquisition of dynamic PET scans can provide reduction in sample sizes only in secondary prevention trials, representing a reasonable alternative to static imaging while reducing the need for exposing healthy participants to ionizing radiation. In contrast, the use of an early composite seem to only benefit primary prevention trials, suggesting that regional analyses must be informed by disease stage in order to provide improved statistical power to trials. Overall, refining inclusion criteria can result in considerable reductions in sample size requirements by identifying individuals at the peak of amyloid accumulation and/or restricting trials to APOE-ε4 carriers. These results may provide guidance on how to design smaller Phase II Proof-of-Concept trials without penalizing statistical power to detect treatmentrelated changes in amyloid accumulation.