Alzheimer’s disease progression by geographical region in a clinical trial setting

Introduction To facilitate enrollment and meet local registration requirements, sponsors have increasingly implemented multi-national Alzheimer’s disease (AD) studies. Geographic regions vary on many dimensions that may affect disease progression or its measurement. To aid researchers designing and implementing Phase 3 AD trials, we assessed disease progression across geographic regions using placebo data from four large, multi-national clinical trials of investigational compounds developed to target AD pathophysiology. Methods Four similarly-designed 76 to 80 week, randomized, double-blind placebo-controlled trials with nearly identical entry criteria enrolled patients aged ≥55 years with mild or moderate NINCDS/ADRDA probable AD. Descriptive analyses were performed for observed mean score and observed mean change in score from baseline at each scheduled visit. Data included in the analyses were pooled from the intent-to-treat placebo-assigned overall (mild and moderate) AD dementia populations from all four studies. Disease progression was assessed as change from baseline for each of 5 scales - the AD Assessment Scale-cognitive subscale (ADAS-cog11), the AD Cooperative Study- Activities of Daily Living Scale (ADCS-ADL), Mini-Mental State Examination (MMSE), the Clinical Dementia Rating scored by the sum of boxes method (CDR-SB), and the Neuropsychiatric Inventory (NPI). Results Regions were heterogeneous at baseline. At baseline, disease severity as measured by ADAS-cog11, ADCS-ADL, and CDR-SB was numerically worse for Eastern Europe/Russia compared with other regions. Of all regional populations, Eastern Europe/Russia showed the greatest cognitive and functional decline from baseline; Japan, Asia and/or S. America/Mexico showed the least cognitive and functional decline. Conclusions These data suggest that in multi-national clinical trials, AD progression or its measurement may differ across geographic regions; this may be in part due to heterogeneity across populations at baseline. The observed differences in AD progression between outcome measures across geographic regions may generalize to 'real-world' clinic populations, where heterogeneity is the norm. Trial registrations ClinicalTrials.gov NCT00594568 – IDENTITY. Registered 11 January 2008. ClinicalTrials.gov NCT00762411 – IDENTITY2. Registered 26 September 2008 ClinicalTrials.gov NCT00905372 – EXPEDITION. Registered 18 May 2009 ClinicalTrials.gov NCT00904683 – EXPEDITION2. Registered 18 May 2009


Introduction
Alzheimer's disease (AD) is a progressive neurodegenerative disorder generally first manifesting as cognitive impairment, progressing to impairment of daily function and, ultimately, loss of independence, debility, and death from complicating medical comorbidities. In the past decade, researchers have studied treatments that target the underlying pathophysiology of AD, as yet without approval of any diseasemodifying drug entity. Development of such diseasemodifying therapies is made challenging by the length and size of studies required to demonstrate positive effects on co-primary outcome measures of cognition and function. Most trials have enrolled several hundred to more than a thousand patients, with studies lasting 12 to 18 months [1]. In order to enroll patients in a reasonable time period and meet regulatory requirements for local registration, sponsors have increasingly implemented more multinational AD studies [2] that cover multiple geographic regions and encompass many cultures, languages, and healthcare delivery systems. Multinational AD studies may be helpful in furthering our understanding of the effects of a therapy across various standards of care, family structures, and societal views on outcomes [3].
Despite this trend for large, multinational AD trials, relatively little is known about the implications of conducting them [3]. There are multiple reasons to expect heterogeneity in these trials. Making a clinical diagnosis of AD is challenging and, historically, has occurred by elimination of other potential etiologies. Even in the clinical trial setting, 18 to 22% of patients clinically diagnosed with AD were found to be lacking evidence of pathophysiology of AD using amyloid positron emission tomography tracers [4,5]. Moreover, AD is a complex disease with multiple risk factors including advancing age, lower education level, and carrying the ε4 allele of the apolipoprotein E (APOE) gene as well as other specific genetic loci [6][7][8][9][10]. Differences in prevalence of risk factors, variability in clinical diagnosis and differences related to culture, access to healthcare, and clinical trial conduct across geographic regions [2] may result in heterogeneity in patient populations recruited for global AD trials. Additional factors that could lead to heterogeneity in multinational AD trials were discussed at a meeting of representatives from the Alzheimer's Association, sponsors, regulatory bodies, and vendors, and were published by Doody and colleagues [3]. This heterogeneity has the potential to result in differences in rates of disease progression across regions. While differences in dementia prevalence across geographic regions have been documented [11], to date there is a paucity of published data on disease progression across geographic regions.
To aid researchers in designing, implementing, and analyzing data from multinational phase 3 AD trials, we assessed disease progression across geographic regions using placebo data from four large, multinational clinical trials of compounds developed to target the underlying pathophysiology of AD [12,13]. An additional perspective on AD across geographic regions is provided by Grill and colleagues [14], who assessed recruitment, retention, and safety reporting across regions using data from these four AD trials.

Methods
Placebo data from four randomized double-blind placebocontrolled AD trials (IDENTITY, IDENTITY2, EXPED-ITION, EXPEDITION2) were used in this exploratory analysis. The study designs have been published previously [12,13,15,16]. Study protocols were reviewed and approved by the relevant ethical review boards (see Acknowledgements). Briefly, IDENTITY and IDENTITY2 were 76week trials designed to study the effect of semagacestat, a γ-secretase inhibitor no longer in development, on the progression of AD; EXPEDITION and EXPEDITION2 were 80-week trials designed to study the effect of solanezumab, a humanized anti-Aβ peptide antibody currently in development, on the progression of AD. For each trial, the research protocol was approved by the ethical review board at each study site participating in that trial. Written informed consent for study participation was provided by the study subject or a legally authorized representative, in accordance with the Declaration of Helsinki. The current study analyzing data collected across these clinical trials was reviewed by the University of California, Los Angeles (UCLA) Medical Institutional Review Board and deemed as not meeting the definition of human subjects research.
The entry criteria were nearly identical for the four studies and included patients 55 years and older with moderate or mild AD dementia, documented on the basis of a score of 16 to 19 and of 20 to 26, respectively, on the Mini-Mental State Examination (MMSE), and meeting criteria of the National Institute of Neurological and Communicative Diseases and Stroke-Alzheimer's Disease and Related Disorders Association for probable AD. Patients with other etiologies for dementia were to be excluded.
Subjects were required to be medically stable with a reliable study partner who spent >10 hours per week with the patient. Subjects were permitted to receive cholinesterase inhibitors and/or memantine during the studies but had to be stable in dose prior to entry and remain stable during the studies.
Efficacy measures in the four studies included the 11-item Alzheimer's Disease Assessment Scalecognitive subscale (ADAS-cog11; range 0 to 70, higher scores worse) [17], the Alzheimer's Disease Cooperative Study -Activities of Daily Living Scale (ADCS-ADL; range 0 to 78, lower scores worse) [18], the Clinical Dementia Rating scored by the sum of boxes method (CDR-SB; range 0 to 18, higher scores worse) [19,20], the MMSE (range 0 to 30, lower scores worse) [21], and the Neuropsychiatric Inventory (NPI; range 0 to 144, higher scores worse) [22,23]. Scales were translated into the native language(s) of the region and the ADAS-cog11 and ADCS-ADL were administered by raters trained and qualified in their administration and scoring. Training without qualification was provided for all other scales administered in the trials. The protocols specified that the same rater was to rate the ADAS-cog11 throughout the study and this rater should not rate the ADCS-ADL. If a rater left a site, both training and qualification of the new rater was required. Raters falling below minimal pretrial experience levels in administering the ADAS-cog were required to complete additional (enrichment) training and pass a prequalification examination before undergoing the formal qualification training and examination at the startup meeting. In addition, if raters were incorrectly scoring the ADAS-cog11 or MMSE as determined during instudy rating reviews (performed at baseline and 52 weeks for the IDENTITY program, and at baseline and 12 weeks for the EXPEDITION program), they were contacted and reminded of the correct scoring algorithm and asked to correct their error(s).
IDENTITY and IDENTITY2 were implemented at 300 sites in 31 countries, with enrollment from April 2008 to May 2010. EXPEDITION and EXPEDITION2 were implemented at 211 sites in 16 countries, with enrollment from May 2009 to June 2012. As the result of identifying an unfavorable benefit/risk ratio with semagacestat in an interim safety analysis, the IDENTITY studies were amended to discontinue the study drug and follow study subjects for an additional 7 months. Only placebo data from the initial, randomized study period of up to 76 weeks were considered in the present analyses. At the time at which the IDENTITY protocols were amended, both studies were fully enrolled with 37.7% and 6.1% of the IDENTITY and IDENTITY2 study subjects, respectively, having been followed for the full 76-week initial study period [15]. The EXPEDITION and EXPEDITION2 studies were completed in April 2012 and June 2012 with 73% and 78% of the study populations, respectively, observed for the full 80 weeks.

Statistical analysis
The small sample sizes in most countries (Table 1) necessitated regional instead of by-country analyses. Geographic regions were defined based on a modified version of criteria used by Glickman and colleagues [24]. Countries were combined into regions based on ethnicity and healthcare delivery systems, to increase sample size. Regions were as follows: North America (United States, Canada); South America/Mexico (Argentina, Brazil, Chile, Mexico); Western Europe/Israel (Belgium, Denmark, Finland, France, Germany, Israel, Italy, Spain, Sweden, United Kingdom); Eastern Europe/Russia (Bulgaria, Hungary, Poland, Romania, Russia, Serbia, Turkey, Ukraine); Australia/South Africa; Asia (China, India, Korea, Taiwan); and Japan.
Analyses were performed using SAS version 9.2 (SAS Institute Inc, Cary, NC). Descriptive analyses were performed for the observed mean score and the observed mean change in score from baseline at each scheduled visit. Spearman's rank correlations were used to assess the relationship among AD scales at baseline and among baseline-toendpoint (18 months) changes in scores by region. Data included in the analyses were pooled from the intent-totreat placebo-assigned overall (mild and moderate) AD populations from all four studies. Disease progression was assessed as the change from baseline for each of the four scales (ADAS-cog11, ADCS-ADL, MMSE, CDR-SB). Measurements considered in the analyses were those performed at baseline and 76/80 weeks (depending on study), as well as at 12, 28, 40, 52, and 64 weeks for ADAS-cog11 and ACDS-ADL, 52 weeks for MMSE, and 28 and 52 weeks for CDR-SB.
For the EXPEDITION program, a completer was defined as a subject who had completed the 80-week double-blind study period. For the IDENTITY program, a completer was defined as a subject who had completed the 76-week initial treatment period; the denominator in this case was the number of subjects who had an opportunity to complete 76 weeks of treatment before the study drug was stopped at request of the sponsor and the study was amended.

Enrollment and study completion by region and country
Subject enrollment and completion is shown by country and region in Table 1. Overall, data from 2,079 subjects assigned placebo (EXPEDITION, n = 506; EXPEDITION2, n = 519; IDENTITY, n = 501; IDENTITY2, n = 553) were included in the analyses. Since the study drug was stopped before intended study termination in the IDENTITY program and the studies were amended, many study subjects did not have the opportunity to participate until the endpoint visit at 76 weeks. As a result, the IDENTITY program had a numerically smaller proportion of completers than the EXPEDITION program.

Baseline characteristics by region
There were numerical differences in baseline characteristics among regions for these placebo-assigned subjects ( Table 2). The Asia population had the lowest proportion of subjects with mild disease, defined as MMSE 20 to 26 (42%); generally, subjects were oldest in North America and South America/Mexico, and youngest in Western Europe/Israel and Eastern Europe/Russia; subjects had received the most years of education in North America and least education in South America/Mexico and Asia; there were fewer males than females enrolled overall, but Western Europe/Israel enrolled the highest and South America/Mexico and Eastern Europe/Russia the lowest proportions of male subjects; while 74 to 94% of subjects received concomitant AD treatment, it was most common in Western Europe/Israel and least common in Eastern Europe/Russia and Australia/South Africa; and APOE ε4 carriers were most common in Western Europe/Israel and North America and least common in Asia. Baseline disease severity, as measured by ADAS-cog11, ADCS-ADL, and CDR-SB, was worse for Eastern Europe/Russia compared with populations in other regions (Table 3, Figure 1), but this was not the case for the MMSE and NPI.

Experience with ADAS-cog11 rating by region
Across study programs, enriched training was required most frequently in Asia and Japan. In IDENTITY, North America, Western Europe/Israel, Eastern Europe/Russia, and Australia/South Africa had less need for remedial training (Table 4); in EXPEDITION, less need for remedial training was evident in Western Europe/Israel, followed by North America and Eastern Europe/Russia.

Disease progression by region
Of all regional populations, Eastern Europe/Russia showed the greatest cognitive and functional decline from baseline; Japan, Asia, and/or South America/Mexico showed the least cognitive and functional decline ( Table 3, Figure 1). For ADAS-cog11 specifically, Eastern Europe/ Russia showed the most cognitive decline over the course of the study (mean change from baseline to 18 months was 11.0) while Asia and Japan showed the least decline (mean change from baseline to 18 months was 3.5 and 4.4, respectively); North America, Australia/South Africa, and Western Europe/Israel showed a similar decline (mean change from baseline to 18 months was 6.0 to 7.5). In the case of the NPI, the 18-month decline was greatest for Australia/South Africa, while there was some improvement in score at 18 months for South America/ Mexico.

Correlations between outcome measures by region
Correlations among outcome measures at baseline and changes in outcome measures from baseline to endpoint for each region are shown in Figure 2. The range of correlations across regions was generally greater for change from baseline than at baseline. Scales that include cognitive assessment (ADAS-cog11 and MMSE) were consistently well correlated across regions (−0.5 to −0.8); functional (ADCS-ADL) and global (CDR-SB) assessment scale scores were also well correlated across regions. For change from baseline to endpoint, these correlations were generally lowest for Asia and/or Japan and highest for Eastern Europe and/or Australia/South Africa. Measures of cognition (ADAS-cog11 or MMSE) were less correlated with functional (ADCS-ADL) or global scales (CDR-SB). Correlations between NPI scores and other scale scores were generally less than 0.5.

Discussion
The objective of this analysis was to better understand disease progression among geographic regions in the setting of multinational AD clinical trials, based on analysis of placebo data from the IDENTITY and EXPEDITION study programs. Although some regions had relatively small sample sizes (Australia/South Africa, n = 84), differences in AD progression or its measurement over the trial periods were evident across regions. Eastern Europe/Russia showed the greatest cognitive and functional decline from baseline amongst the regions on the ADAS-cog11 and ADCS ADL scales (see Table 3), while Asia, Japan, and South America/ Mexico showed the least. The two regions with the largest study populations, North America and Western Europe/ Israel, showed a similar decline. For the 18-month change from baseline on the ADAS-cog11, the 11.0 point increase for Eastern Europe/Russia was appreciably different from the regions with the least change from baseline (Asia 3.5 points, Japan 4.4 points). Change from baseline for North America and Western Europe/Israel, as well as Australia/ South Africa, ranged from 6.0 to 7.5 points. The differences among geographic regions observed here may have been the result of heterogeneity in the study populations across regions. Younger age, female gender, greater baseline disease severity, absence of treatment with acetylcholinesterase inhibitors and/or memantine, and an APOE ε4 genotype have been associated with more rapid clinical disease progression [6,25,26]. The study population of Eastern Europe/Russia was younger, had a higher proportion of females, and had a lower proportion treated with AD medications at baseline; baseline outcome measure scores were also generally more severe for Eastern Europe/Russia, compared with the other populations. The proportion of the Eastern European/Russian population who were APOE ε4 carriers (51%) as well as the proportion with mild AD at baseline (54%), however, lay in the middle of the range across regions. More detailed findings on differences/similarities in baseline characteristics across these geographic regions are presented by Grill and colleagues, who concluded that populations recruited in to clinical trials are likely to differ across regions due to multiple factorsdifferences in lifestyle factors, overall health, access to medical care, standard of AD diagnosis and treatment, reimbursement for AD services and treatment, family attitudes toward AD recognition, reporting of symptoms and research participation, diagnosis and treatment, and ethnogenetic differences including those resulting in different prevalence of APOE ε4 carrier status [14]. Another factor which could contribute to variability in measurement of disease progression in these multinational clinical trials is language differences; although a centralized translation service was used to minimize the effect of translation on outcomes, this still does not guarantee equivalence among cultural groups or regions. Local differences may require slight adjustment of particular items. For example, in the IDENTITY and EXPEDITION study programs, orientation to county on the MMSE had to be adjusted to accept a response of 'region' or 'burro' where the concept of counties was not applicable. This may have contributed to some variability in outcomes across regions because patients may recall region more readily than county. Differences in levels of rater experience could also have contributed to the observed differences in measurement of AD progression across regions. Yet in these study programs, extensive rater trainingincluding enriched training where necessary and qualification at investigator meetingswas implemented before raters were permitted to administer the ADAS-cog11. Miller and colleagues demonstrated previously that raters who require and  Data presented as raters in the study who had required enrichment training/total number of raters in the study (%). ADAS-cog11, 11-item Alzheimer's disease Assessment Scalecognitive subscale. receive enriched training for ADAS-cog11 perform similarly to their more experienced colleagues [27]. Therefore, it is unlikely that rater experience alone could account for the differences in measurement of AD progression seen in our analyses. Based on the findings from these analyses, it is prudent to be mindful of potential regional differences when designing trials, performing analyses, and interpreting findings, so that information collected is of maximum benefit to all populations who will ultimately have access to the drug entity, once approved. Differences in AD progression or its measurement across geographic regions in the clinical trial setting are probably reflective of the real-world situation where heterogeneity of populations and their treatments is expected and common. Importantly, in the IDENTITY and EXPEDITION programs, the regional differences did not preclude detection of active drug effects [12,13]. If a drug effect can be detected within a study population showing some heterogeneity in disease progression, the effects are more probably generalizable to a heterogeneous clinic population.
For these analyses, we also assessed correlations between scales across regions. Scales that measure cognition (ADAS-cog11 and MMSE) were consistently well correlated with each other across regions but there was more variability among regions in correlations between cognitive (ADAS-cog11 or MMSE) and functional (ADCS-ADL) or global scales (CDR-SB), with some regions showing higher correlations than others. Better understanding of why this variability occurred will require further study, but potential differences among cultures in the relevance of functional measures could have contributed [28]. In addition, the cognitive assessments are performancebased tests administered to the patient, whereas the functional scales are proxy report by the caregiver. Since functional scales are more subjective in nature, these may be more susceptible to cultural influences, and this may contribute to regional variability in correlations between scales.
To our knowledge, this is the first published analysis assessing AD progression in a clinical trial setting across geographic regions. Schneider and Sano reviewed data from 11 AD clinical trials of patients with mild-to-moderate dementia, conducted both in the United States and outside the United States, but did not perform regional analyses [1]. Overall, for these 11 studies the 18-month mean change from baseline on ADAS-cog11 ranged from 4.34 to 9.10 (standard deviation 8.2 to 9.4); analysis methodology did differ across the studies. In the present analyses, overall findings were similar, with an 18-month mean change in ADAS-cog11 of 6.23 (standard deviation 9.48).
There are limitations to these analyses. Geographic groupingswhile based on those of Glickman and colleagues [24] and expected similarities in environmental factors (for example, healthcare, culture) across countries within regionsmay be somewhat arbitrary, and heterogeneity within regions is likely. Analysis by country would have reduced this effect to some degree, but sample sizes were generally small at the country level, limiting the interpretability of findings. In some countries, patients were enrolled only in IDENTITY, a program in which the study drug was stopped due to an unfavorable benefit/risk ratio for active drug (semagacestat). As a result, the proportion of study completers in these IDENTITY-only countries was small. Despite grouping countries into regions, the sample size limitation remained to some extent and we performed descriptive analyses rather than formal comparisons across regions.

Conclusion
These data suggest that AD progression or its measurement may differ across geographic regions in multinational clinical trials; this may be in part due to heterogeneity across populations at baseline. The observed differences in AD progression and correlations between outcome measures across the geographic regions may be reflective of the real-world situation, where heterogeneity of populations and their treatments is expected and common. Trial sponsors will need to continue to implement multinational studies due to required study sizes, enrollment rates, and regulatory requirements; these data will be helpful in study planning. Competing interests JDG has been a trial investigator for Alzheimer's Disease Cooperative Study, Eli Lilly, Merck, Biogen Idec, Janssen Alzheimer's Immunotherapy, Genentech, and Avanir (in the past 2 years). DBH, SAD, Y-FC, HL-S, and AMH are full-time employees and minor stock holders at Eli Lilly and Company. RSD has served as principle investigator for clinical trials for which her institution received payment from Accera, Avanir, Genentech, Janssen AD Immunotherapy, Merck, Pfizer, and Takeda; has provided consultations to Abbvie, Accera, AC Immune, Avanir, AZ Therapies, Baxter, Biotie, Cerespir, Chiesi, GlaxoSmithKline, (See figure on previous page.) Figure 2 Correlation between scales by region. (a) Baseline measures. (b) Mean change from baseline to 18 months for study completers. ADAS-cog11, 11-item Alzheimer's disease Assessment Scalecognitive subscale; ADCS-ADL, Alzheimer's disease Cooperative Study -Activities of Daily Living; AS, Asia; AU, Australia/South Africa; CDR-SB, Clinical Dementia Rating Scale sum of boxes; EE, Eastern Europe/Russia; JP, Japan. MMSE, Mini-Mental State Examination; NA, North America; NPI, Neuropsychiatric Inventory; SA, South America/Mexico; WE, Western Europe/Israel.