A novel age-informed approach for genetic association analysis in Alzheimer’s disease

Background Many Alzheimer’s disease (AD) genetic association studies disregard age or incorrectly account for it, hampering variant discovery. Methods Using simulated data, we compared the statistical power of several models: logistic regression on AD diagnosis adjusted and not adjusted for age; linear regression on a score integrating case-control status and age; and multivariate Cox regression on age-at-onset. We applied these models to real exome-wide data of 11,127 sequenced individuals (54% cases) and replicated suggestive associations in 21,631 genotype-imputed individuals (51% cases). Results Modeling variable AD risk across age results in 5–10% statistical power gain compared to logistic regression without age adjustment, while incorrect age adjustment leads to critical power loss. Applying our novel AD-age score and/or Cox regression, we discovered and replicated novel variants associated with AD on KIF21B, USH2A, RAB10, RIN3, and TAOK2 genes. Conclusion Our AD-age score provides a simple means for statistical power gain and is recommended for future AD studies. Supplementary Information The online version contains supplementary material available at 10.1186/s13195-021-00808-5.


Background
Genetics plays an important role in the onset of Alzheimer's disease (AD) with an estimated heritability ranging from 58 to 79% [1]. Over the last decade, genome-wide association studies (GWAs) of AD have identified over 40 susceptibility loci [2][3][4][5], by meta-analyzing genotypeimputed data from numerous cohorts genotyped on various single nucleotide polymorphism (SNP) arrays. With each updated GWA, the increasing sample sizes and improved imputation quality of low frequency variants have enabled additional discoveries. A complementary approach is to use next generation sequencing to directly genotype every variant, alleviating the need for imputation and enabling rare variant discoveries. To this aim, the Alzheimer's Disease Sequencing Project (ADSP) undertook whole-exome sequencing (WES) of 10,836 individuals (53% cases) which led to the discovery of novel AD risk genes [6,7]. The ADSP individuals were part of existing AD cohorts and were selected based on a risk score accounting for APOE ε2 and APOE ε4 alleles, sex, and age at onset (AAO) for cases and age at last exam or death for controls [6]. This design promoted the inclusion of controls least likely to develop AD by age 85 years and was shown to maximize statistical power compared to other approaches such as using age-matched cases/controls [6].
Across prior AD GWAs, the common approach to association testing was to perform case-control logistic regression analyses adjusted for age. Theoretically, this adjustment should account for increasing AD prevalence with age in the population, independently of genetic factors [8,9]. However, most AD cohorts include the AAO for cases and last known age without cognitive impairment for controls. This common design leads to the average age of cases being lower than the average age of controls. If one performs a case-control logistic regression with a traditional age adjustment, the model will infer that age has a negative effect on AD risk, meaning that younger individuals are more likely to develop AD. Since advanced age is the greatest risk factor for AD [9], it appears essential to correctly account for age. The latter conundrum is particularly relevant to the ADSP where, by design, the average age of controls is 10 years greater than that of cases.
In this work, we aimed to improve on prior AD GWA studies by evaluating and implementing models that inherently, correctly account for age effects on AD. To this aim, we estimated the statistical power of different models on simulated data, reflecting various age differences between cases and controls as found in AD cohorts. These models included logistic regression on AD case-control status adjusted and not adjusted for age, linear regression on a newly designed score which weights case-control status by age, and multivariate Cox regression on AAO, which models cumulative conversion risk across the life span. We then applied these models to exome-wide AD data with a next generation sequenced discovery sample (5075 controls and 6052 cases) and replicated suggestive associations in an independent sample of genotype-imputed individuals (10,539 controls and 11,092 cases).

Power simulations
We performed power simulation studies to evaluate the performance of different AD genetic association models (R.v3.5.1). We first simulated population level data that mimics population AD prevalence estimates at ages 60-100 across a range of age-related risk effect estimates (OR 1.01-1.25) [10,11]. The age effect estimate on AD status (OR 1.16) served as a reference to evaluate power for AD GWA studies [12]. We then simulated AD case-control datasets by random sampling of cases and controls from the population level data. To simulate realistic AD case-control datasets [13][14][15], subjects' mean age was centered on 75 years following a binomial distribution with a standard deviation of 8 years. Simulated subjects were restricted to the age range of 60-100, after which cases and controls were randomly drawn abiding by model conditions. To evaluate how age differences between cases and controls affect power for variant discovery, subjects were further sampled to three conditions: (1) no mean age difference between cases and controls, (2) cases' mean age is 5 years younger than in controls, and (3) cases' mean age is 10 years younger than in controls. These conditions, particularly condition 2, are similar to those observed for common AD GWAS cohorts [13][14][15], while condition 3 mimics the design of the ADSP WES study. The power was calculated based on 1000 simulation replicates, and the linear regression on the AD-age score was estimated with bootstrap-based inference (100 resamplings). Each replicate included either 1000 cases and 1000 controls, or 5000 cases and 5000 controls, respectively testing for a significance level of α = 0.05 or α = 5 × 10 −7 (i.e., exome-wide significance). These parameters respectively mimic common AD GWA cohorts and the ADSP WES study [7]. We evaluated power for a range of realistic effect sizes (OR 1.05, 1.10, 1.20, 1.50) and common minor allele frequency (MAF) 0.01, 0.05-0.45 (at 0.05 increments) Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD).
The European individuals in ADSP WES [6,7], ADSP whole-genome sequencing (WGS) [21,25], and the Accelerating Medicine Partnership in AD (AMP-AD) WGS [22,24,26] cohorts comprise our discovery sample and were mega-analyzed (Table 1 and Table S1). The ADSP WES selection criteria have already been introduced; the selection scheme led to a 10 years' average age difference between cases and controls [6,7]. For AMP-AD, the reported age for cases was not always AAO; thus, the average age of controls was only 2 years greater than that of cases ( Fig. 1).
As a replication sample, we mega-analyzed 34 cohorts, each corresponding to a specific SNP array applied to an AD case/control dataset [3,[16][17][18][19][20][21][22][23][24]. Some of these cohorts correspond to the same AD study but individuals were genotyped on different platforms. These cohorts are heterogeneous in terms of age reported and are extensively described elsewhere [3,13] (Table 1 and Table S2). When multiple ages were available for a given subject, the order of priority for which age to use was AAO then age at examination then age at death in affected individuals, and age at death then age at last examination in control participants [13]. We removed any duplicated individuals across these cohorts and the discovery sample.

Genetic quality control
For each cohort included in our analysis, we first determined the ancestry of each individual with SNPWeights v2.1 [27] using reference populations from the 1000 Genomes Consortium [28]. Prior to ancestry determination, variants were filtered based on genotyping rate (< 95%), MAF < 1% and Hardy-Weinberg equilibrium (HWE) in controls (p < 10 −6 ). By applying an ancestry percentage cut-off > 75%, the samples were stratified into five super populations: South-Asians, East-Asians, Americans, Africans and Europeans, and an admixed group composed of individuals not passing cut-off in any single ancestry. Since most individuals were Europeans and to avoid spurious associations, we focused on European ancestry individuals.
Carriers of known pathogenic mutations on APP, PSEN1, PSEN2, and MAPT were excluded from our analysis. Discordant pathology cases, defined as any clinically diagnosed AD individual with Braak stage below III or neuritic plaques level below moderate, were excluded from our analysis.
The joint called set of exome variants in the ADSP WES is composed of 1,524,414 SNPs [6,7]. We restricted downstream analysis to these variants, meaning that variants called only in ADSP WGS or AMP-AD were not included. To remove potential sequencing artifacts, we applied several quality control (QC) steps to each dataset. First, SNPs were checked for consistency with the Haplotype reference consortium (HRC) panel [29]. This check included flipping SNPs reported on incorrect strand and excluding SNPs with more than 10% MAF difference with the HRC panel. Second, we removed SNPs that deviated from HWE in controls (p < 10 −6 ) or that had a genotyping rate below 95%. Third, we removed any variants which had a flag different than PASS in gnomADv3 [30].. Following these QC steps, 905,341 variants remained. For analysis, we considered 124,679 variants with minor allele count above 10, to ensure a minimum number of carriers.
In each cohort of the replication sample, SNPs with less than 95% genotyping rate or deviating from HWE in controls (p < 10 −6 ) were excluded. Then, we used the gnomAD database [30] to filter out SNPs that met one of the following criteria: (i) located in low complexity region, (ii) located within common structural variants (MAF > 1%), (iii) multiallelic SNPs with MAF > 1% for at least two alternate alleles, (iv) located within a common Ins/Del (insertion/deletion), (v) having any flag different than PASS in gnomAD, and (vi) having potential probe polymorphisms [31]. The latter are defined as SNPs for which the probe may have variable affinity due to the presence of other SNP(s) within 20 bp and with MAF > 1%. Individuals with more than 5% genotype missingness were excluded. Imputation was performed on the Michigan imputation server using the TOPMed reference panel [32,33]. Per cohort, only variants with sufficient imputation quality (r 2 > 0.3) were included in the replication analysis (Table S3).
Identity-by-descent was run to determine the relatedness between all individuals using PLINKv1.9 [34]. In the discovery sample, we kept only one version of duplicated individuals and removed first degree relatives keeping AD relatives over controls, and when both had a concordant diagnosis, we kept the younger case or older control. In the replication sample, we removed any individuals already present in the discovery, and for duplicate subjects, we kept the copy from the SNP array with the highest genome coverage.
On the subset of remaining individuals, we computed genetic principal components to account for population Table 1 Detailed demographics for discovery and replication sample. Details per cohort included in the discovery and replication can be found respectively in Tables S1 and S2. HC healthy controls, AD Alzheimer's disease stratification [35] in both the discovery and replication samples, separately.
Statistics, association models, and AD-age score We considered four main models: logistic regression on AD diagnosis adjusted for age, logistic regression on AD diagnosis, linear regression on a score integrating casecontrol status and age, and multivariate Cox regression on AAO. When AAO was not available, the first known age with AD diagnosis was used. Our analyses removed individuals younger than 60 and censored maximum age at 100. We considered controls below 60 as uninformative and cases below 60 as early onset AD potentially due to a causal mutation. For the third model, we defined the AD-age score as follows: log(1-weight (age)) − 0.5 for controls −log (weight (age)) + 0.5 for cases The score was designed to abide by the following rules: cases and controls should be clearly separated (maximum value for controls − 0.5 and minimum value for cases + 0.5, ensuring that the minimum difference between cases/controls is greater than 1), younger cases should have higher scores compared to older cases, and older controls should have lower scores than younger ones. This ensured that younger cases and older controls were at opposite extremes of the score spectrum and assumed these individuals influenced genetic associations the most.
For the analysis of exome-wide data, all models had two subversions: (1) adjusted for sex and 10 first principal components of population structure and (2) additionally adjusted for APOE ε2 and APOE ε4 alleles.
The associations for the first three models were estimated with PLINKv2.0 [36] using the -glm flag, which performs a logistic regression for case/control phenotype and a linear regression for quantitative phenotype. The Cox regression associations were estimated with gwasurvir [37].
We calculated the number of independent variants with PLINKv1.9 [34] (option -indep-pairwise 1000 50 0.1), which identified 87,034 linkage disequilibrium blocks covering the 124,679 considered variants. Thus, the exome-wide threshold was set at p < 5 × 10 −7 (0.05/ 87034, Bonferroni correction) and the suggestive threshold at p < 1 × 10 −5 (1/87304). A 1 Mb region around the APOE locus was excluded from the reported results due to its well-established association with AD. We did not correct for the number of tested models due to their high correlation (cf. Results), nor for the two versions of adjustment (APOE ε2 and APOE ε4 alleles adjusted or not), as in Bis et al. [7], since these were similarly highly correlated.
Thirty-one variants passing the suggestive threshold in the discovery were evaluated in the replication sample. We disentangled spurious and true associations based on their associations in the replication dataset. SNVs with discordant direction of effect were considered to be spurious associations. Variants which had a concordant direction of effect and p < 1.6 × 10 −3 (0.05/31, Bonferroni correction) in the same regression model, allowing different covariate adjustment, were considered significant, while those with p < 0.05 were considered to replicate nominally.
For more robust and powerful inference with the ADage score, which is not normally distributed, we performed bootstrapping (100 resamplings) consistent with what was done in power simulations. To limit the computational burden, we only computed the bootstrapbased inference for the set of replicated variants, which allowed us to compare the significance of the linear regression on AD-age score with the Cox regression for true associations.
Last, we performed a fixed-effect meta-analysis using the metafor package in R [38] to estimate the significance of the replicated variants in the combined discovery and replication samples .

Gene and variant annotations
Each variant consequence was annotated with the Ensembl Variant Effect Predictor toolset [39]. Nonsynonymous variants, such as missense or frameshift variants, may lead to loss or gain of function that may affect the enzymatic activity, stability, and/or interaction properties at the protein level. Synonymous variants, by contrast, do not typically directly affect protein function; however, they can influence protein expression both at the transcriptional and translational level [40].
To disentangle the role of the synonymous common variants as potential expression quantitative trait loci (eQTL), we queried the largest brain cis-eQTL metaanalysis which included 1433 post-mortem brain samples from the AMP-AD and CommonMind Consortium [41].
Lastly, for mapped genes harboring significant variants, we queried the AMP-AD fixed-effect meta-analysis of gene differential expression between AD and control individuals across brain tissues [22,24,26].

Age-informed AD risk estimation increases power for genetic association testing
Power outcomes for specific illustrations of simulation analyses, considering a range of age-related risk effect estimates, are presented in Fig. 2 and Figure S1. An overview of power differences between different association models for all simulations' conditions, varying the AD risk associated with age, is provided in Figure S2. In Fig. 1 Proposed AD-age score visualization and its distribution in the discovery and replication samples. a Illustration of the proposed AD-age scores with a linear and a piecewise definition of the weight (age) function (see the "Methods" section). b, c Cases (AD) and controls (CN) age distribution in the discovery, composed of next generation sequencing data, and in the replication, composed of SNP-array imputed data, and d, e their respective AD-age score distributions Fig. 2 Power of different association models for two specific simulation outcomes. a-c A common variant with moderate effect size, evaluated in 1000 cases and 1000 controls at a significance level of α = 0.05, mimicking the condition of common AD cohorts genotyped on SNP arrays. d-f An uncommon variant with large effect size, evaluated in 5000 cases and 5000 controls at a significance level of α = 5 × 10 −7 , mimicking the condition of ADSP WES which allows exploration of uncommon and rare variant associations. Panels show power on the y-axis and age-related effect estimates on the x-axis. Outcomes for four models are shown (cf. legend) and the age-related effect estimate for AD [OR (Age-AD)] is marked by a vertical gray dotted line. From left to right, panels show simulation results for increasing mean age differences between cases and controls (cases being younger than controls where applicable) simulations where the mean age of cases was younger than in controls, adjustment for age in logistic regression analyses compared to not adjusting for age led to critical power loss (Fig. 2), amounting to as much as 90% power loss in some conditions ( Figure S2 A-D). The AD-age score model performed best overall across all four models, displaying power increases regardless of age differences between cases and controls, particularly for the estimated age effect on AD status [12] corresponding to the vertical gray line on Fig. 2, S2. Power gain of the AD-age score with regard to logistic regression not adjusted for age was on average 5%, up to 10% in some scenarios ( Figure S2 C-D). The Cox regression on AAO performed worse than the unadjusted logistic regression when the cases and controls were age matched and better when the age difference increased (Fig. 2e, f). Power gain of the AD-age score with regard to Cox regression was between 5 and 10% in some scenarios, notably when cases controls were age matched ( Figure S2 G-H). When age difference is 10 years, the AD-age score and Cox regression performs similarly with some scenarios showing 1% increased power for AD-age while others showed 0-2% power gain for the Cox regression. Figure S3 shows that all models have the same type I error control under our simulation paradigm.

Exome-wide association
Exome-wide association with AD in the discovery sample for all four models and their subversions are shown in (Figure S4-S7). QQ plots for each exome-wide association show no inflation (λ < 1.1), except for the Cox regression adjusted for APOE ε2 and APOE ε4 allele dosages (λ = 1.19) (Table S4, Figure S8-S11). The logistic regression adjusted for age showed no associations above the suggestive threshold outside of the APOE region ( Figure S4). Across the three other models, a total of 31 variants passed suggestive significance, including 5 known AD risk loci [7]. The parameter estimates of these models: (i) OR (odd ratio) for logistic regression, (ii) exp(β) for the linear regression, and (iii) HR (hazard ratio) for the Cox regression were found to be highly correlated ( Figure S12), with (i-ii) Pearson correlation: r 2 = 0.80 (p = 3 × 10 −12 ), (i-iii) r 2 = 0.84 (p = 4 × 10 −14 ), and (ii-iii) r 2 = 0.97 (p < 2 × 10 −16 ). The known TREM2 missense single nucleotide variant (SNV) (rs75932628) was exome-wide significant in the three models. Other known associations included synonymous SNVs on PILRA (rs2405442), MS4A6A (rs12453), NSF (rs199533, lead SNV of a locus also encompassing MAPT and KANSL1), and a frameshift deletion on ABCA7 (rs547447016) (Fig. 3, Tables 2 and 3). The association on PILRA was exome-wide significant in the AD-age score linear regression and suggestive in the Cox regression but did not reach the suggestive threshold in the logistic regression. Similarly, the association on ABCA7 was suggestive in both AD-age score and Cox regressions, but not in the logistic regression. On the contrary, the association on MS4A6A was suggestive in the logistic regression and in the AD-age score and just below significance in the Cox regression. The association on NSF/ MAPT/KANSL1 was suggestive in all three models. In addition to these 5 known exonic associations, associations on 26 other exonic loci were at least suggestive in one of the three models (Table S5). Logistic regression ( Figure S5) produced one spurious association on ETV3L, the AD-age score linear regression led to three spurious associations on TACR3, PCDHA7, and the one on ETV3L, while the Cox regression ( Figure S6) had 16 spurious associations including the one on TACR3. The logistic regression model showed no novel suggestive association. The AD-age score linear regression, prior to bootstrap ( Figure S7), produced two novel suggestivelevel associations: one USH2A missense SNV (rs111033333) and one RIN3 missense SNV (rs150221413), which replicated nominally. The Cox regression produced several exome-wide significant associations in the discovery with concordant direction of effect in the replication including NAV2 (rs11828836), RAB10 (rs149622307), and the USH2A and RIN3 associations, also found in the AD-age score linear regression. Among suggestive associations in the Cox regression, two significantly replicated: RAB10 synonymous SNV (rs149622307) and TAOK2 synonymous SNV (rs4077410), and three nominally replicated: KIF21B synonymous SNV (rs2297911), and the previous missenses on USH2A and RIN3. NAV2 synonymous SNV (rs11828836) did not reach nominal significance (p = 0.17), but it was imputed with sufficient quality in only 9235 individuals (less than 50% of imputed individuals). CDKL1 intronic SNV (rs61981931) did not reach nominal significance (p = 0.09).
For the set of replicated variants (Table 2), we metaanalyzed the discovery and independent replication results. Seven out of the ten exonic variants were most significant in the linear regression on the AD-age score, while only two performed best in the Cox regression, those on KIF21B and TAOK2, and one in the logistic regression, on MS4A6A ( Figure S13). After meta-analysis, the variants located on RAB10, TREM2, PILRA, MS4A6A, and RIN3 were exome-wide significant (p < 5 × 10 −7 ) (Table S6).

Functional annotation
Among the mapped genes (Table 3), the synonymous variants on PILRA and KANSL1 were significantly associated with the expression of their respective mapped gene (false discovery rate (FDR) corrected). At the nominal significance level, TAOK2 and KIF21B synonymous variants were also associated with the expression of their respective genes. Among nearby genes with FDRsignificant eQTL association, PVRIG was the strongest association at the PILRA locus, KANSL1-AS1 at the NSF/MAPT/KANSL1 locus, and INO80E at the TAOK2 locus (Table S6).
In the meta-analysis of differential gene expression across brain tissues in AMP-AD, TREM2, KANSL1, RAB10, MS4A6A, and RIN3 were found to be significantly upregulated in AD compared to control individuals, while TAOK2 was significantly downregulated (reported associations were FDR-significant, Table S7).

Discussion
In the AD data simulation, we showed that incorrectly adjusting for age led to critical power loss and that weighting the known effect of age on AD risk in the phenotype increased statistical power for variant discovery. Testing these models on real AD data confirmed our simulation observations and enabled the discovery of novel variants modulating AD risk.

Previous literature
The main prior AD WES study aimed to address the age adjustment conundrum in the ADSP WES data by implementing three different logistic regression models: the main one being unadjusted for age, while the other two were age adjusted [7]. However, given that cases were on average younger than controls, the age adjustment was in the opposite direction of the true age effect on AD risk. It is perhaps unsurprising, therefore, that there were no replicated findings from the two ageadjusted models (only associations from the main ageunadjusted model in the ADSP discovery were replicated) [7].
An alternative approach has been to use Cox regression on AAO for improved power compared to logistic regression that only considers case-control status. Cox regression has proven successful in predicting an individual's AD conversion risk by calculating a polygenic hazard score [42,43]. However, it needs to abide by several assumptions, including proportional hazards across age. Several studies have shown that Cox regression performs better than logistic regression on case-control data when AAO is available [44,45], but it has not been (See figure on previous page.) Fig. 3 Manhattan plots of exome wide associations in the four main models excluding the APOE region. The age-adjusted logistic regression has no suggestive association (dashed line, p < 1 × 10 −5 ). The main causal variant on TREM2 is exome-wide significant (solid line, p < 5 × 10 −7 ) in the other three models. Among suggestive associations, (i) known AD associations are in red and (ii) novel associations which replicate (p < 0.05) in an independent dataset are in blue (cf Table 3). Colored dots were bootstrapped in the AD-age score model (see the "Methods" section). The minimum p value from the adjustment models for each main model is displayed as in [7] Table 2 Main association results. Effect corresponds to OR (odds ratio) for logistic regression on AD status not adjusted by age (LogReg), exp(β) for linear regression on AD-age score (LinReg), and HR (hazard ratio) for multivariate Cox regression on age-at-onset (CoxReg). Correlation between these measures is high for suggestive associations as shown on Figure S11. P p value, m model subversion. Subversion codes are (1) adjusted for sex and 10 first principal components of population structure and (2) additionally adjusted for APOE ε2 and APOE ε4 alleles. Two types of weighted AD-age score were used with (A) corresponding to a linear effect of age between 60 and 100 and (B) accounting for the changes in AD prevalence slope in this age range [8]  applied to the ADSP WES data. Cox regression was previously applied to AD GWA, using genotype-imputed data overlapping partially with the ADSP sample used here, and led to the discovery of novel associations [46]. Alternative approaches have been proposed when Cox regression's assumptions are violated as in AD GWA, including age stratification [47] and generalized Cox regression [48]. Our proposed AD-age score offers additional flexibility without these assumptions and it can accommodate age information other than AAO such as age-at-study and age-at-death. Unlike Cox regression models, the AD-age score can be flexibly incorporated as a quantitative outcome into conventional tools (e.g., PLINK) for GWAS and new methods (e.g., BOLT-LMM, SAIGE) for analysis of large/biobank scale genetic data with related samples. Additionally, the linear and logistic regressions are faster than Cox regression and thus more advantageous for larger datasets [44].
Oversampling cases with early AAO and controls with late censoring time for exome sequencing is an efficient design because it directs limited study resources towards subjects that are most useful for discovering the genetic associations of AD in the original cohorts [49,50]. We proposed the AD-age score for improved power in the discovery stage and validated the findings using an independent replication sample. Although the hypothesis testing is appropriate in the discovery stage with extreme sampling, it is worth noting that the estimated genetic effect/odds ratio may not represent that in the whole population [51]. To obtain unbiased genetic effect estimations of AD risk in the whole population, it may be advisable to turn to more advanced methods that can explicitly address the biased sampling design (e.g., [49,52]).

Potential disease mechanisms
The novel variants identified through our exome-wide association, with the exception of the USH2A SNV, are located on genes previously linked to AD, re-enforcing our confidence in these associations (Table S8).
Our main finding is a rare variant on RAB10 passing the exome-wide threshold in discovery and surviving Bonferroni correction in the replication. RAB proteins are key regulators of vesicular trafficking and play a major role in the endolysosomal and retromer pathways known to be linked to AD [53]. Another rare RAB10 SNV was shown to segregate with AD resilience in pedigrees at risk for AD and RAB10 was shown to be upregulated in AD brains [54], a finding corroborated in our study. RAB10 knockdown significantly decreased Aβ 42 and Aβ 42 /Aβ 40 ratio in neuroblastoma cells [54]. Silencing of RAB10 decreased β-amyloid peptides (Aβ) and increased soluble ectodomain of APP β (sAPPβ) [55], supporting a role of RAB10 in either γ-secretase cleavage of APP and the degradation of Aβ. Moreover, phosphorylated Rab10 was prominent in neurofibrillary tangles in the hippocampus of AD individuals but scarce in controls [56]. Mechanistically, the JNK-interactin protein 1 (JIP1) mediates the anterograde transport of Rab10positive cargo to axonal tips which promotes axonal growth and is critical for proper neuronal function [57]. JIP1 also regulates anterograde and retrograde transport of APP along axons [58]. These molecular mechanisms suggest that Rab10 could play a role in APP trafficking along axons.
Additionally, our exome-wide analysis identified a missense variant on Rab interactor 3 (RIN3). Common variants in a locus near RIN3 and SLC24A4 were reported to be associated with AD susceptibility [2]. Increased RIN3 expression in APP/PS1 mouse models was shown to correlate with endosomal dysfunction and altered axonal trafficking and processing of APP [59]. For these reasons, the Rab-related proteins involved in the endolysosomal and retromer pathways have been considered as promising therapeutic targets for AD [53]. Two common exonic variants, on TAOK2 and KIF21B, were identified as suggestive in our discovery analysis and replicated (Bonferroni corrected and nominally, respectively). Previous AD GWAS summary statistics show a concordant direction of effect with our analysis [2,3] with the SNVs p values on TAOK2 and KIF21B in those studies equal to 0.05 and 10 −5 , respectively (Table S8). TAOK2 was shown to be phosphorylated in AD and frontotemporal lobar degeneration brains. Its expression was colocalized with tangles and its inhibition reduced tau phosphorylation [60]. Further, KIF21B is involved in neuronal and synaptic signaling and increased KIF21B expression levels were associated with more severe AD pathology [61].

Limitation
For common synonymous variants, the regulated gene and true causal variant remain uncertain because our study focused on exomes and we cannot perform a genome-wide colocalization analysis. The causal variant may be intergenic and in linkage disequilibrium with a common synonymous variant identified in our analysis. Thus, future genome-wide studies are warranted to help disentangle which nearby genes are regulated, notably for the novel common loci encompassing KIF21B and TAOK2.

Conclusion
Correctly accounting for the risk-increasing effect of age on AD is an efficient means of increasing statistical power. Thus, our AD-age score should prove useful in future AD genetic association studies to enable the discovery of additional novel variants.

Acknowledgements
Not applicable. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ ADNI_Acknowledgement_List.pdf. Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery  ARIC research is carried out as a collaborative study supported by NHLBI contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C). Neurocognitive data in ARIC is collected by U01 2U01HL096812, 2U01HL096814, 2U01HL096899, 2U01HL096902, 2U01HL096917 from the NIH (NHLBI, NINDS, NIA and NIDCD), and with previous brain MRI examinations funded by R01-HL70825 from the NHLBI. CHS research was supported by contracts HHSN268201200036C, HHSN268200800007C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086, and grants U01HL080295 and U01HL130114 from the NHLBI with additional contribution from the National Institute of Neurological Disorders and Stroke (NINDS). Additional support was provided by R01AG023629, R01AG15928, and R01AG20098 from the NIA. FHS research is supported by NHLBI contracts N01-HC-25195 and HHSN268201500001I. This study was also supported by additional grants from the NIA (R01s AG054076, AG049607, and AG033040) and NIND S (R01 NS017950). The ERF study as a part of EUROSPAN ( Biological samples and associated phenotypic data used in primary data analyses were stored at Study Investigators institutions, and at the National Cell Repository for Alzheimer's Disease (NCRAD, U24AG021886) at Indiana University funded by NIA. Associated Phenotypic Data used in primary and secondary data analyses were provided by Study Investigators, the NIAfunded Alzheimer's Disease Centers (ADCs), and the National Alzheimer's Coordinating Center (NACC, U01AG016976) and the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS, U24AG041689) at the University of Pennsylvania, funded by NIA. This research was supported in part by the Intramural Research Program of the National Institutes of Health, National Library of Medicine. Contributors to the Genetic Analysis Data included Study Investigators on projects that were individually funded by NIA, and other NIH institutes, and by private US organizations, or foreign governmental or nongovernmental organizations.