Genome-wide analysis of genetic predisposition to Alzheimer’s disease and related sex disparities

Background Alzheimer’s disease (AD) is the most common cause of dementia in the elderly and the sixth leading cause of death in the United States. AD is mainly considered a complex disorder with polygenic inheritance. Despite discovering many susceptibility loci, a major proportion of AD genetic variance remains to be explained. Methods We investigated the genetic architecture of AD in four publicly available independent datasets through genome-wide association, transcriptome-wide association, and gene-based and pathway-based analyses. To explore differences in the genetic basis of AD between males and females, analyses were performed on three samples in each dataset: males and females combined, only males, or only females. Results Our genome-wide association analyses corroborated the associations of several previously detected AD loci and revealed novel significant associations of 35 single-nucleotide polymorphisms (SNPs) outside the chromosome 19q13 region at the suggestive significance level of p < 5E–06. These SNPs were mapped to 21 genes in 19 chromosomal regions. Of these, 17 genes were not associated with AD at genome-wide or suggestive levels of associations by previous genome-wide association studies. Also, the chromosomal regions corresponding to 8 genes did not contain any previously detected AD-associated SNPs with p < 5E–06. Our transcriptome-wide association and gene-based analyses revealed that 26 genes located in 20 chromosomal regions outside chromosome 19q13 had evidence of potential associations with AD at a false discovery rate of 0.05. Of these, 13 genes/regions did not contain any previously AD-associated SNPs at genome-wide or suggestive levels of associations. Most of the newly detected AD-associated SNPs and genes were sex specific, indicating sex disparities in the genetic basis of AD. Also, 7 of 26 pathways that showed evidence of associations with AD in our pathway-bases analyses were significant only in females. Conclusions Our findings, particularly the newly discovered sex-specific genetic contributors, provide novel insight into the genetic architecture of AD and can advance our understanding of its pathogenesis. Electronic supplementary material The online version of this article (10.1186/s13195-018-0458-8) contains supplementary material, which is available to authorized users.


Background
Alzheimer's disease (AD) is a slowly progressive neurodegenerative disorder that usually manifests with insidious deterioration of cognitive functions such as memory, language, judgment, and reasoning. Visuospatial deficits and neuropsychiatric symptoms like anxiety, irritability, depression, delusion, and personality changes may occur in the course of the disease, and these are eventually followed by impairment of most daily activities [1,2]. The median survival is 3.3-11.7 years after disease manifestation [3]. Except for some uncommon autosomal dominant forms, AD is mainly a complex disorder with a polygenic nature [2,4] that predominantly affects elderly individuals, also known as late-onset AD. It is the most common cause of dementia in the elderly worldwide [5] and is the sixth leading cause of death in the United States [6]. Age is the main risk factor for AD. The annual incidence increases from 1% at age 65 years to 6-8% after 85 years [7], and its prevalence increases from 11% to 32% [5]. In addition, AD is more prevalent in females than males [7][8][9][10], with their lifetime risk of developing the disease being almost twice that of males [7]. This might be to some extent justified by different life expectancies of males and females. However, Genin et al. [11] suggested that the age-adjusted penetrance of Apolipoprotein E (APOE) was sex dependent as well. For instance, they found that the lifetime risks for homozygote APOE-ε4 carriers were 51% and 60% in males and females older than 85 years, respectively. The corresponding risks for heterozygote APOE-ε3ε4 carriers were 23% and 30%, respectively [11]. AD is also more severe in females than males [9]. Henderson and Buckwalter [12] reported that female AD patients had greater impairment of naming task, verbal fluency, and delayed recall compared to male patients. In another study, Barnes et al. [13] suggested that females were more likely to develop clinical AD compared to males in response to pathology changes (e.g., amyloid beta (Aβ) and neurofibrillary tangles) in the brain. They found that each additional unit of pathology in the brain would increase the odds of overt AD by 20-fold and 3-fold in females and males, respectively [13]. The underlying mechanisms of sex disparity in AD are not fully clear [9,14]. This may raise the possibility that such sex disparities might be in part due to potential differences in the genetic bases of AD between males and females. Investigating such differences is important, particularly for tailoring more effective medical interventions [14,15].
Give the considerable physical, emotional, and economic burdens imposed by AD on patients, their families, and societies, exploring the genetic and nongenetic mechanisms underlying its pathogenesis has become a public health priority. With increased life expectancy, the prevalence and global economic costs of AD are forecast to increase considerably by 2050 [5]. Many studies have investigated the genetic basis of AD. APOE was the first gene linked to late-onset AD [16], and, in particular, the dosage of its ε4 allele was implicated in increasing the risks of disease and earlier onset [17]. More susceptibility loci were detected with the advent of genome-wide association (GWA) methodology, although not all of them were consistently replicated in independent datasets. In addition to APOE, which was almost universally replicated, BIN1, CLU, CR1, CD2AP, CD33, MS4A4E, MS4A6A, EPHA1, and PICALM genes have been associated with the polygenic form of AD in different studies [18,19]. The narrow-sense heritability (h 2 ) of AD (i.e., the proportion of its phenotypic variance explained by additive genetic variance) has been estimated to be 58-79% by twin studies [20]. Furthermore, Ridge et al. [19], using a linear mixed models (LMMs) framework, found that 53% of phenotypic variance of AD can be explained by~8 million single-nucleotide polymorphisms (SNPs). They also noticed that SNPs inside known AD-associated genes or within their 50 kb upstream/ downstream regions can only explain~31% of AD phenotypic variance (~59% of genetic variance) [19], leaving a sizable portion of its h 2 to be explained.
In this study, we investigated the genetic architecture of polygenic AD through genome-wide association (GWA), transcriptome-wide association (TWA), gene-based, and pathway-based analyses in four independent datasets (two with family designs and two with population designs) using genetic information for approximately 2 million genotyped and imputed SNPs. Since exploring the genetic sex disparity of AD was of particular interest, in addition to analyzing the entire sample of males and females in each dataset, two alternative plans were also considered in which either only males or only females were included in analyses.

Study participants
Four independent datasets were used to fulfill the aims of this study: Late-Onset Alzheimer's Disease Family Study from the National Institute on Aging (NIA--LOADFS) [21]; Framingham SNP Health Association Resource (SHARe) project from Framingham Heart Study (FHS) [22][23][24]; SNP Typing for Association with Multiple Phenotypes from Existing Epidemiologic Data (STAMPEED) project from Cardiovascular Health Study (CHS) [25]; and University of Michigan Health and Retirement Study (HRS) [26]. All four datasets were approved by the institutional review boards (IRBs) and had gathered data after obtaining written informed consent from participants or their legal guardians/proxies. Details about the designs of the NIA-LOADFS, FHS, CHS, and HRS studies can be found in the original publications. Briefly, the NIA-LOADFS is a familybased study primarily initiated to investigate late-onset AD risk factors. It recruited families with multiple affected members if the age at AD onset or diagnosis of proband was above 60 years. Controls were selected from unaffected individuals with a minimum age of 50 years who had no history of major neurological/psychiatric disorders or life-threatening conditions. Of 9468 participants with phenotype data, 5220 subjects (2319 affected with AD), predominantly Caucasians, were genotyped using Illumina's Human 610-Quad array. The FHS is an ongoing longitudinal study with a familybased design that provides phenotype and genotype information on individuals from three-generational families with Caucasian ancestry. The main objective of the study was to investigate cardiovascular disorder risk factors. It was first initiated by recruiting 5209 participants (i.e., original cohort) between ages 30 and 62 years with no history of cardiac disease or stroke. Later, the cohort was expanded by adding the offspring of the original cohort and their spouses (5124 subjects as the offspring cohort) and their grandchildren (4095 subjects as the third generation). Of these, 9274 individuals (1529, 3852, and 3893 individuals from the three aforementioned generations, respectively) were genotyped using the Affymetrix Human Mapping 500 K array in the SHARe project. The CHS is a population-based longitudinal study with the main objective of investigating risk factors contributing to heart diseases. It was initiated by recruiting an original cohort of 5221 mainly Caucasian participants who were older than 65 years and had not been institutionalized. Later, a new cohort of 687 participants, predominantly African-Americans, was added to the study. Of these, 3989 and 803 individuals were genotyped by Illumina's Human CNV370-Duo and Human Omni1-Quad arrays, respectively, in the STAMPEED project. The HRS is a populationbased longitudinal study launched to provide age-related health and economic information on more than 20,000 individuals older than 50 years. The HRS makes use of administrative records such as Social Security and Medicare claims to gather information of interest about participants. The study was expanded in 2006 to include a biomarker and genetic component in which 12,595 individuals, predominantly Caucasian, were genotyped by Illumina's Human Omni2.5-Quad array.
Our study focused on people of Caucasian ancestry from the four aforementioned studies to increase the sample size and power of the analyses. The LOADFS and FHS datasets directly identify cases with Alzheimer's disease and unaffected controls. For the CHS and HRS datasets, the International Classification of Disease codes, ninth revision (ICD-9) were used to define cases and controls. Finally, to make the four datasets comparable in terms of participants age, we only included the original and offspring cohorts from the FHS dataset. Demographic information about the cohorts included in our study is presented in Table 1. Also, Additional file 1: Table S1 lists the numbers of cases and controls in these cohorts.

Imputation of genotype data
Since the four datasets of interest were genotyped using different platforms, imputation was conducted to generate a common set of 2,928,658 SNPs. Only autosomal SNPs were subject to imputation. Genome coordinates of SNPs in our data (NCBI build 38/UCSC hg38) were lifted over to NCBI build 37/UCSC hg19 using LiftOver software [27]. After removing duplicate SNPs, preimputation quality control (QC) was performed using PLINK software [28] to remove low-quality SNPs/subjects by setting the following QC criteria: minor allele frequency < 0.01, SNPs and subject call rates < 95%, and Hardy-Weinberg p < 1E-06. For the LOADFS and FHS cohorts that have family-based designs, a Mendel error rate of 2% was set to remove SNPs and subjects/families with high Mendelian errors. The SHAPEIT2 (i.e., Segmented Haplotype Estimation and Imputation Tool) package [29] was used to ensure that alleles were aligned to the same DNA strand in our and the reference data. Haplotype phasing was then conducted using SHAPEIT2 to estimate the haplotypes for subjects in each dataset. Finally, genotypes were imputed by Minimac3 software [30] over prephased haplotypes. SHAPEIT2 and Mini-mac3 were run using default values for input arguments and European population (EUR) haplotypes from 1000 Genomes Phase 3 data (release October 2014) as the reference panel.

Postimputation QC
Directly genotyped SNPs along with the imputed SNPs, for which the squared correlation (r 2 ) between imputed and expected true genotypes was > 0.7, were selected for preanalysis QC. This step was performed based on the same criteria explained earlier for preimputation QC. Additional file 1: Table S2 contains information on the numbers of genotyped and imputed SNPs that remained in each of the four datasets of interest after QC.

Population structure
The top 20 principal components (PCs) of genotype data were obtained through principal component analysis (PCA) to be included in downstream genetic analyses to address potential population stratification. In each dataset, PCA was performed over a subset of unrelated individuals and a subset of SNPs that were not in high linkage disequilibrium (LD) measured by r 2 [31]. KING (i.e., Kinship-based Inference for Genomewide association studies) software [32] was used to obtain the subset of unrelated subjects by keeping one subject per family or relative cluster whose identity-bydescent (IBD) was > 0.0884 (i.e., closer than third-degree relatives). The genotyped autosomal SNPs on each chromosome were then pruned by PLINK software [28] in an unrelated set of subjects such that no SNP pairs with r 2 > 0.2 were kept within any 100-SNP windows. PCA was then conducted over the selected low-LD SNPs with the GENESIS R package [33,34]. Additional file 1: Table S3 contains genomic inflation factors (λ values) resulting from logistic regression models for the four datasets under consideration. The λ values were less than 1.1 in all cases, indicating a subtle impact of population structure on our analyses [35,36].

Genetic analysis GWA analysis
The associations between SNPs and AD were investigated by fitting logistic regression models. The genetic analyses of each dataset were performed under three alternative plans analyzing the entire sample, only males, and only females. The top five PCs and subject's birth cohort (i.e., birth year) were included in the models as fixed-effects covariates. In addition, sex was considered a fixed-effect covariate under plan 1. Only additive genetic effects were modeled; dominance effects were ignored. The birth cohort is a proxy for the age and environmental exposures which are characteristic for a cohort. Thus, this adjustment controls for age and overtime trends in the incidence of AD. The logistic models were fitted using PLINK software (v1.07) [28]. It was previously suggested that for samples with a family-based design, ignoring family relationships would not generate considerable bias in effect sizes of SNPs but may increase type I error rates whose magnitude depends on pedigree complexity (e.g., nuclear family vs extended family) and trait heritability. For instance, the inflation of type I error rates has been suggested to be trivial in datasets with simple pedigrees. On the other hand, type I error rates may increase by a factor of 2-3 when family structure is ignored in a dataset with an extended family pedigree and trait heritability values of 0.6-0.9. Therefore, a two-step screening-validating approach could be used with such datasets to prevent inflation of type I error rates and decrease the computational burden of analysis [37]. For the LOADFS and FHS cohorts, we adopted a two-step approach in which the SNPs with p < 0.05 in the logistic models explained earlier were subjected to fitting generalized linear mixed models (GLMMs) by including all aforementioned fixed-effects covariates along with family IDs as a random-effects covariate. GLMMs were fitted using the lme4 R package [33,38]. All GWA analyses were conducted in a discovery-replication manner. Each of the LOADFS, FHS, CHS, and HRS datasets was considered a discovery set to detect SNPs in significant associations with AD. Results from the discovery stage in a particular dataset were then subject to further replication in the remaining three datasets. At the discovery stage, a genome-wide significance level of p < 5E-08 was set to select statistically significant associations, and SNPs with p values between 5E-08 and 5E-06 were considered suggestive AD-associated markers. These significance levels are widely accepted by genome-wide association studies in order to decrease the type I error rate (i.e., false-positive findings) due to multiple testing issues arising from investigating associations of millions of SNPs [39,40]. A Bonferroni-corrected significance threshold of 0.0167 (i.e., 0.05/3, where 3 is the number of replication datasets for validating any significant association signals from a discovery dataset) was considered at the replication stage.
Finally, a conventional fixed-effects meta-analysis, using the inverse variance method, was conducted over the results under each plan from the four investigated datasets to obtain combined statistics for the tested SNPs. To avoid missing heterogeneous associations of opposite directions of effects, we also performed a meta-analysis on absolute values of coefficients in addition to the conventional meta-test. The results from the meta-analysis on absolute values of coefficients were used just as an additional piece of information to determine how heterogeneous effects in different cohorts can affect the results of a conventional inverse-variance meta-analysis. The meta-analysis results were interpreted according to the significance level at the discovery phase. The meta-analysis was performed using GWAMA (i.e., Genome-Wide Association Meta-Analysis) software [41].
Also, for SNPs that had significant p values only in males or females (i.e., plans 2 or 3), a Wald χ 2 statistic with 1 degree of freedom was calculated according to the following formula [42] to investigate whether their odds ratios were significantly different between the two sexes: where b m and b f are the coefficients (i.e., the natural logarithm of odds ratios) for any SNP in males and females, respectively, and se m and se f are their corresponding standard errors. The significant findings from GWA analyses were compared to previous studies using the GRASP (i.e., Genome-Wide Repository of Associations Between SNPs and Phenotypes) search tool (v2.0.0.0) [43]. Also, LD between significant SNPs and previously detected AD-associated loci in their 1-Mb flanking regions (r 2 ≥ 0.4 or significant p value from χ 2 test for LD) was investigated in the CEU population (i.e., Utah Residents with Northern and Western European Ancestry) through the HaploR R package [33,44] and the LDlink web-tool [45]. The genes coordinate's list provided by PLINK [28] was used to find the closest genes of the significant SNPs. The chromosomal regions (i.e., cytogenetic bands) were determined using the annotation database from UCSC Genome Browser [46].

Gene-based analysis
Under each of three aforementioned plans, gene-based analysis was performed over the meta-analysis results using the fastBAT (i.e., Fast set-Based Association Test) method [47] implemented in the GCTA (i.e., Genomewide Complex Trait Analysis) package (v1.26.0) [48]. This method combines z-statistics for a set of SNPs corresponding to each gene into a quadratic form of a multivariate normal variable. SNPs located within a gene or its 50 kb upstream/downstream regions were considered as an SNP set for that gene. The HRS dataset was used as the reference panel for LD calculation (i.e., r 2 metric) in order to remove one of each pair of SNPs with r 2 > 0.9 from any given set. To deal with multiple-testing issues, the false discovery rate (FDR) method suggested by Benjamini and Hochberg [49] was used to rank and select significant findings. Genes with significant p values at the FDR level of 0.05 were considered novel AD-associated ones if there were no SNPs with p < 5E-08 in their 1-Mb upstream/downstream regions in the current or previous studies.

Pathway-based analysis
A pathway-based analysis was also performed using the fastBAT method using the pathways predefined by the REACTOME pathway knowledgebase [50] and PID (i.e., the Pathway Interaction Database) [51]. These were provided by the molecular signatures database (MSigDB) at the Broad Institute gene set enrichment analysis (GSEA) website [52,53]. Here, a SNP set corresponding to a particular pathway was defined as the SNPs within 50 kb of the genes in that pathway. As with the gene-based analysis, the HRS cohort was used to prune the SNP sets based on the pairwise LD measures of SNPs. The significant results were interpreted at the FDR levels of 0.05 (plans 1 and 2) and 0.025 (plan 3) to ensure that the number of possible false-positives was < 1 under each analysis plan.

TWA analysis
Results from conducted meta-analyses along with summary data from a publicly available expression quantitative trait loci (eQTLs) study on peripheral blood [54] were used to perform a transcriptome-wide association analysis using SMR (i.e., Summary-data-based Mendelian Randomization) software (v0.68) [55]. The eQTLs summary data were downloaded from the SMR software website. Both cis-eQTLs and trans-eQTLs were of interest. Trans-eQTLs were defined as eQTLs located at least 5 Mb away from a probe on the same chromosome or located on other chromosomes. Probes for which at least one eQTL with p < 5E-08 had been detected by Lloyd-Jones et al. [54] were included in our analyses provided that the corresponding eQTLs were among the genotyped or imputed SNPs in our study. This resulted in the inclusion of sets of up to 8257 probes with cis-eQTLs and 2763 probes with trans-eQTLs.
The significance of p values resulting from SMR testing (i.e., P SMR ) was interpreted at an FDR level of 0.025-0.05. The appropriate FDR level for each of three analysis plans was chosen so we can ensure that the number of possible false-positive findings among significant probes was < 1. To identify the pleiotropic effects of SNPs on gene expression levels and AD development, probes with significant P SMR values were then subject to heterogeneity testing (i.e., the HEIDI test) which can differentiate pleiotropy from linkage [55,56]. Genes corresponding to probes that passed both the SMR and HEIDI tests (i.e., significant P SMR and P HEIDI ≥ 0.05) were deemed significant as their expression profiles might be associated with AD because of the pleiotropic effect of a single variant that affects both probe expression and AD susceptibility. Selected genes were considered potentially novel AD genes if there were no SNPs with p < 5E-08 within their 1-Mb upstream/ downstream regions in the current or previous studies.
Finally, we also performed TWA analyses using summary results from a publically available tissue-specific eQTLs study [57] which contains eQTLs data for several regions of the brain, including the amygdala, anterior cingulate cortex (BA24), basal ganglia (e.g., caudate, nucleus accumbens, and putamen), cerebellar hemisphere, cerebellum, cortex, frontal cortex (BA9), hippocampus, hypothalamus, and substantia nigra. Once again, probes that had significant eQTLs with p < 5E-08 were included in our analyses. This resulted in the inclusion of sets of 597-3566 probes with cis-eQTLs (based on the brain region). The results of brain-specific TWA analyses were interpreted at a FDR level of 0.05.

GWA analysis
GWA analyses were performed in four independent datasets (i.e., LOADFS, FHS, CHS, and HRS). Each of these datasets served as a discovery set to detect SNPs with significant association signals (at either a genomewide significance level of p < 5E-08 or a suggestive level between 5E-08 and 5E-06), which were then subject to further replication (at the significance level of 0.0167) in the other three datasets. These analyses provided replicated and nonreplicated sets of SNPs. Finally, results from the individual datasets were combined through meta-analysis and interpreted according to the significance level at the discovery phase. Additional file 1: Tables S4-S12 provide an overview of replicated, nonreplicated, and meta-analysis sets of SNPs that were significantly associated with AD in males and females combined (plan 1) or males and females separately (plans 2 and 3). As seen in these tables, most of the newly detected AD-associated SNPs, particularly those in nonreplicated and meta-analysis sets, had significant p values only in one of the three study plans. For instance, among 44 and 72 newly detected SNPs in males and females, 36 and 51 SNPs had sex-specific significant p values, respectively. Additional file 1: Figures S1-S6 show the Manhattan and QQ plots of the GWA results in the four investigated datasets, as well as in the conducted meta-analyses under these three plans. In general, SNPs with p values smaller than the genomewide significance threshold were mostly located on chromosome 19.

Replicated sets of SNPs
The replicated sets of SNPs under plans 1-3 contained 31, 20, and 23 SNPs, respectively (Additional file 1:  Tables S4-S6). These SNPs had significant p values at the genome-wide level or a suggestive level of associations at the discovery stage and were then replicated in another dataset. Additional files 2, 3, and 4 contain detailed information (e.g., allele frequencies, odds ratios (ORs), p values, etc.) about the replicated SNPs in the four tested datasets under the three analysis plans. Notably, 12, 8, and 8 replicated SNPs, respectively, had not been previously associated with AD. The other SNPs had some evidence of direct association signals [43]. Among previously detected SNPs, rs9882471 (plan 2) was nominally associated with AD in previous studies (5E-06 ≤ p < 5E-02) [58].
Most of the newly detected SNPs were located inside a previously well-known susceptibility region for AD on chromosome 19q13 (i.e., APOE cluster gene region) and were mostly significant under different analysis plans. This subset of newly detected SNPs mostly had p < 5E-08, the same directions of effects in discovery and replication datasets, and significant p values (at genome-wide or suggestive levels of significance) in the meta-analysis. Table 2 summarizes information about the four newly detected SNPs located outside the chromosome 19q13 region. Among these SNPs, rs62402815 was significant under plan 1 (i.e., males and females) and plan 3 (i.e., only females); and rs9918162 and rs726411 were significant only in males (i.e., plan 2). Their association signals were significant only at the suggestive level of associations (except rs62402815, which had a genome-wide level significant p = 1.2E-08 in females) in the discovery stage. The two SNPs that were significant in males did not have p < 5E-06 in conventional fixed-effects metaanalyses, which might be partially due to the heterogeneity of their effects across different datasets. These heterogeneous effects were reflected by high i 2 inconsistency metrics and significant Q-statistics in Cochran's heterogeneity test (P q < 0.05). A meta-analysis based on the absolute values of the coefficients confirmed a substantial role of heterogeneity by providing smaller p values for most of these SNPs.
Also, rs62402815 and rs726411 had the same direction of effects in the discovery and replication datasets. The directions of effects of rs9918162 were opposite in the discovery and replication sets. While genetic variants that have the same direction of effects in multiple independent cohorts are generally of more interest, those with opposite effects can be important as well because they may be indicative of the genetic heterogeneity of the studied trait in different cohorts arising, for example, from the epistasis or differences in LD patterns [59][60][61].   SNP did not have significant sex-specific effects. All SNPs that were significant only in males or females also had significant sex-specific effects except rs62402815, rs12386284, rs2829803, and rs2298369 Although no evidence of direct association with AD was found in previous studies for the newly detected subsets of replicated SNPs, their 1-Mb upstream/downstream regions harbor AD-associated SNPs. We therefore investigated their LD with AD-associated loci in their 1-Mb flanking regions in the CEU population [45]. Newly detected SNPs were considered informative AD markers if their p values were smaller than those of the top AD-associated SNPs in their neighborhood or they were not in LD with previously AD-associated loci whose p values were smaller than those detected in this study. Additional file 1: Table S13 contains LD information about those newly detected SNPs for which proxy AD-associated loci have been reported. As seen in Additional file 1: Table S13, all newly detected SNPs on chromosome 19q13 had larger p values than the top AD-associated loci in their neighborhood and were in LD with them. Therefore, they were likely to relay the same information as their neighboring AD-associated SNPs.
On the other hand, the p values of SNPs located outside the chromosome 19q13 region were mostly smaller than the previously detected association signals in their flanking regions and were not in LD with such loci. As seen in Table 2, among the closest genes to these SNPs, only ADCY8 (corresponding to rs726411 located in the 8q24.22 region) was associated with AD in previous GWAS at a suggestive level of associations (rs263238 with p = 2.40E-06 [62]). In addition, none of the chromosomal regions (i.e., cytogenetic bands) in which other SNPs are located contained any previously AD-associated SNPs with p < 5E-06 [43]. Detailed information about the genes and chromosomal regions corresponding to the newly detected SNPs that contain previously AD-associated SNPs can be found in Additional files 2, 3, and 4.

Nonreplicated sets of SNPs
Additional file 1: Tables S7-S9 (corresponding to plans 1-3) show that 54, 40, and 46 SNPs had significant p values at genome-wide or suggestive levels of associations in only one of the four datasets of interest. Most of them were newly detected (41, 33, and 40 SNPs, respectively), as there was no evidence of their direct association with AD in previous studies [43]. Also, they were mostly plan specific and demonstrated evidence of sex disparity. Most were located in chromosomal regions other than 19q13 and were significant at a suggestive level of associations. Detailed information about nonreplicated sets of SNPs (e.g., allele frequencies, ORs, p values, etc.) can be found in Additional files 2, 3, and 4. Of those SNPs previously associated with AD, rs110 38106, rs9597722, rs723804, rs17697225 [63], rs2065706 [64] (plan 1), rs4679840 [58] (plan 2), and rs1359176 [65] (plan 3) were only nominally significant (5E-06 ≤ p < 5E-02) in previous studies. Once again, SNPs located outside the chromosome 19q13 region either had smaller p values than previously detected AD-associated loci in their proximity or were not in LD with them, except for rs34779859 on chromosome 2 (plan 3) which was significant in females. LD information about those newly detected SNPs for which proxy AD-associated loci have been previously identified can be found in Additional file 1: Table S13.

Meta-analysis sets of SNPs
Additional file 1: Tables S10-S12 show that 17, 4, and 24 SNPs that were not among replicated or nonreplicated sets of significant SNPs under analysis plans 1-3 passed the significance threshold in the meta-analysis. Additional files 2, 3, and 4 summarize the GWA results for these SNPs. The meta-analysis p values of these SNPs were mostly significant at the level of suggestive associations, except for rs76366838, rs115881343 (plan 1), rs73048293, rs57537848, and rs76366838 (plan 3) on chromosome 19q13 which had p < 5E-08. Also, they were mostly located outside chromosome 19q13 and were plan specific (i.e., they were not among replicated, nonreplicated, or meta-analysis sets of significant SNPs under other plans). For example, significant SNPs in males were not significant in females and vice versa. In addition, most SNPs (14, 3, and 24 SNPS under plans 1-3, respectively) were not associated with AD in previous studies [43]. Summary information about the newly detected subset of meta-analysis sets of SNPs that were outside chromosome 19q13 is presented in Table 2. As with the replicated and nonreplicated sets of SNPs, most of the newly detected SNPs not on chromosome 19q13 had smaller p values than the ones reported for their nearby AD-associated loci or were not in LD with them. These SNPs, therefore, were considered novel and informative AD markers. On the other hand, proxy AD-associated SNPs were found for all newly detected SNPs that were located on chromosome 19q13 (Additional file 1: Table S13).

Nominally significant sets of SNPs
Under each of the three analysis plans, there were several SNPs associated with AD at a nominal level of significance (5E-06 ≤ p < 5E-02) in all datasets they were present in. They were mostly present in three datasets as they were excluded from one dataset by the QC procedure. These SNPs (30, 28, and 28 SNPs under plans 1-3, respectively) are listed in Additional files 2, 3, and 4. Although they did not have highly significant p values, they are reported here due to the consistency in their association signals that was observed in multiple tested datasets. With the exception of rs575088, which had nominally significant p values in all datasets under plans 1 and 3, the significance pattern of the other SNPs was observed under only one plan. Also, rs2282079 (detected in females) was among the meta-analysis set of SNPs under plan 1 as well. None of these SNPs had p < 5E-06 in the conducted meta-analyses. The lack of metaanalysis power could be due to the small sample size, weak association signals, absence of some SNPs in one dataset, or heterogeneous effects of some SNPs across the different datasets as evidenced by their high i 2 values, significant Q tests, and smaller p values in metaanalysis on absolute values of coefficients. The SNPs whose associated signals were reported here for the first time were not in LD with previously detected AD-associated loci (p < 5E-06) in their 1-Mb flanking regions (Additional file 1: Table S13). Interestingly, 22 out of 28 SNPs detected in males had the opposite pattern of significance in females (i.e., p > 0.05 in all datasets). Also, 26 out of 28 SNPs detected in females had the opposite pattern of significance in males (Additional file 5). Not all SNPs with opposite patterns of significance in females-only vs males-only analyses had the same pattern in the meta-analysis. Closest genes to some of these SNPs were located in chromosomal regions that were previously associated with AD with p < 5E-06. Information about these genes/regions can be found in Additional files 2, 3, and 4.

Adjustment by APOE SNPs
For the AD-associated SNPs that were located on chromosome 19, we further investigated whether their association signals may change after adjustment for APOE genotypes in the models. For each subject, the APOE genotype was determined based on its genotypes at rs429358 and rs7412 loci using the coding schema provided in Additional file 1: Table S14. We found that none of the tested SNPs had p < 5E-06 once APOE was added as a covariate to the models.
Additional file 1: Table S15 summarizes the information regarding the LD between SNPs detected in our study and APOE SNPs. Among newly AD-associated SNPs, only six SNPs were in LD with one or both of the APOE SNPs. Others were not in LD with the two APOE SNPs (i.e., r 2 = 0.001-0.072) [45]. Therefore, it should be noted that despite a major impact of the APOE genotypes on the associations of other SNPs inside the chromosome 19q13 region with AD, this result would not automatically imply that the APOE SNPs (i.e., rs429358 and rs7412) are the only contributors to AD pathogenesis because APOE-adjusted models highlighted the statistical correlations rather than biological (i.e., genetic) linkage. Further analyses such as those examining the role of haplotypes and epistatic interactions would be helpful to more comprehensively dissect the genetic heterogeneity of this region, and to elucidate the biological relevance of the APOE-adjusted models [70].

Sex-specific effects
We also investigated the sex-specific effects of SNPs that were significantly associated with AD only in males or females by performing a Wald χ 2 test to determine whether their odds ratios were significantly different between males and females. Additional file 1: Tables S16 and S17 summarize the results from this test for replicated, nonreplicated, and meta-analysis sets of AD-associated SNPs. We found that the differences between odds ratios of the SNPs in males and females were significant (p < 0.05) in most cases, except rs62 405605, rs1062851, rs62510850, rs7000333, rs6572843 (among nonreplicated set of SNPs in females), and rs12386284 (among meta-analysis set of SNPs in females). Detailed information about the results from the Wald χ 2 test can be found in Additional file 6. In addition, the SNPs that had significant p values only in males or females were searched against the GRASP catalog [43] to find out whether they were among the known sex-linked autosomal SNPs or were associated with any other diseases/traits at suggestive level of associations. We noticed that there was no evidence of such associations in previous studies.

Gene-based analysis
The significant findings from gene-based analyses corresponding to plans 1-3 are summarized in Table 3. Under all plans, most genes with significant p values at the FDR of 0.05 were located in the chromosome 19q13 region. Since the chromosome 19q13 region harbors several SNPs with p < 5E-08 in both current and previous studies, significant genes in this region are not discussed here as they do not meet the criteria set for detecting novel AD genes. The only significant genes outside the APOE cluster region were LINC00158 under plan 1 and LINC00158, MIR155HG, MIR155, LINC00515, MRPL39, and JAM2 under plan 3 that were located in the chromosome 21q21.3 region. None of the SNPs inside or within 1-Mb flanking regions of these genes had significant p values at the genome-wide level in our study, although several had suggestive-level p values in conducted meta-analyses under plans 1 and 3. Also, SNPs in 1-Mb nearby regions of these genes were only nominally associated with AD (8.0E-04 < p < 5E-02) in previous studies [58,65,[71][72][73]. However, the chromosome 21q21.3 region was associated with AD by previous GWAS at a suggestive level of associations (rs239713 with p = 5.00E-07 [68]). This SNP is located1 .6 Mb away from significant genes reported in our study.

Pathway-based analysis
We found that 19, 10, and 19 pathways were significantly associated with AD under plans 1-3, respectively ( Table 4). The proper FDR levels at which the numbers of possible false-positives were less than 1 were 0.05 under plans 1 and 2, and 0.025 under plan 3. We found that 12 pathways were significant under two or three analysis plans (i.e., they were not plan specific). There were also seven pathways that were significant only under plan 1 (males and females), and seven others were significant only in females (i.e., plan 3). No pathways were specifically significant in males (i.e., plan 2).

TWA analysis Analyzing probes with cis-eQTLs
Using eQTLs data from peripheral blood, we found that four, eight, and four probes/genes passed both the SMR (P SMR < 6.03E-05) and HEIDI (P HEIDI ≥ 0.05) tests under plans 1-3, respectively. The significant FDR level for interpreting the results from the SMR test was set to  [78] were previously associated with AD at the suggestive level of associations. In addition, ATG10 [77] and LPXN [74] were located in chromosomal regions (i.e., 5q14.1 and 11q12.1) that contained AD-associated SNPs with p < 5E-06. Our TWA analyses on brain-specific eQTLs data revealed associations of two probes/genes with AD in males (i.e., CRIPAK and PRDM10), and two others in females (i.e., AHSA2 and ATG10) at the FDR level of 0.05 (Table 6). No probe/gene passed the SMR and HEIDI tests under analysis plan 1. The probe corresponding to the AHSA2 gene was significantly associated with AD in several brain regions (i.e., caudate basal ganglia, cerebellum, cortex, hypothalamus, nucleus accumbens, putamen basal ganglia, and substantia nigra). Also, ATG10 was significantly associated with AD in the nucleus accumbens and putamen basal ganglia. The corresponding top eQTLs were nominally significant in our GWA analyses in males and females (4.30E-05 ≤ P GWAS ≤ 9.33E-02). There were no SNPs with significant p values at the genome-wide significance level within 1 Mb of these genes in our study; however, SNPs with significant p values at the suggestive level of significance were found in flanking regions of ATG10 in the nonreplicated set of SNPs in females (see Additional file 4). In addition, the SNPs within 1 Mb of these four genes were only nominally associated with AD in previous studies [43]. In terms of chromosomal regions, in addition to ATG10 as explained earlier, SNPs in the chromosome 11q24.3 region (PRDM10 gene) were also previously associated with AD at a genome-wide significance level [69].

Analyzing probes with trans-eQTLs
Using eQTLs data from peripheral blood, one probe mapping to the SFN gene on chromosome 1p36 had significant P SMR at the FDR level of 0.05, and passed the HEIDI test under plan 2 ( Table 5). The corresponding top eQTL was located on chromosome 4p16 in the intronic region of the MAEA gene and was nominally associated with AD in our study (P GWAS = 4.10E-04). There were no significant association signals at the genome-wide significance level in the SFN gene or its 1-Mb flanking regions in current or previous studies [43].

Discussion
The genetic architecture of AD has been widely studied in recent years, and so far more than 60,000 SNPs have been associated with AD with p < 0.05. Of these, 281 SNPs (mapped to 49 genes) and 593 SNPs (mapped to 165 genes) had significant p values at the genome-wide and suggestive levels of associations, respectively [43]. Despite these efforts, a major proportion of h 2 of AD Genomic coordinates are based on Human Genome version 38 (hg38) AD Alzheimer's disease, eQTL expression quantitative trait loci, A1 effect allele, A1/A1 freq effect allele and its frequency, GWAS genome-wide association study, SMR summary-data-based Mendelian randomization, SE standard error, HEIDI Heterogeneity in Dependent Instruments a Trans-eQTL Table 6 Significantly AD-associated probe/genes from transcriptome-wide analyses on brain tissue data AD Alzheimer's disease, eQTL expression quantitative trait loci, A1 effect allele, A1/A1 freq effect allele and its frequency, GWAS genome-wide association study, SMR summary-data-based Mendelian randomization, SE standard error, HEIDI Heterogeneity in Dependent Instruments has remained unexplained. Exploring the genetic risk factors contributing to AD is highly important from a precision medicine perspective where the goal is to personalize diagnostic and therapeutic interventions.
Our GWA analyses corroborated the associations of a number of previously detected AD loci and revealed some significant novel association signals. Among previously detected AD-associated SNPs, we found several SNPs with p values that were smaller than those reported before. Also, the significant association signals for three SNPs inside the chromosome 19q13 region (i.e., nonreplicated rs2965169 SNP under plan 1, rs10 426423 from the meta-analysis set of SNPs under plan 1, and rs769450 from the replicated set of SNPs under plan 1 and the nonreplicated sets of SNPs under plans 2 and 3) were previously reported only in African-Americans (p = 2.6E-8, p = 9.9E-7, and p = 5.3E-27, respectively [88]). Most newly detected AD-associated SNPs, particularly those outside the chromosome 19q13 region, can be considered informative AD markers because their p values in our study were smaller than those for other AD-associated loci in their 1-Mb upstream/downstream regions and they were not in LD with such loci. For instance, as seen in Table 2 that summarizes the replicated and meta-analysis sets of SNPs, 11, 4, and 21 novel AD-associated SNPs were detected under plans 1-3, respectively. These SNPs were mapped to 21 genes in 19 chromosomal regions (i.e., cytogenetic regions). Of these, four genes had been associated with AD in previous GWAS with p < 5E-06. Also, nine genes were located in eight chromosomal regions that contained previously AD-associated SNPs that were > 1 Mb away from the SNPs detected in our study. The other eight genes/regions had not been associated with AD in previous studies at genome-wide or suggestive levels of associations [43].
Our GWA analyses also revealed associations of a number of SNPs (41, 33, and 40 SNPs under plans 1-3, respectively) with AD that were present only in one of the four investigated cohorts. While successful replication of a discovered association in an independent cohort has become the gold standard in genome-wide association studies for substantiating the real genetic effects, failure to replicate SNP-disease associations does not necessarily indicate that they are false-positive findings. Instead, they might be real genetic contributors that confer population-specific risks due to the genetic heterogeneity of the disease [2,60,89,90]. Other reasons for nonreproducibility can be the lack of statistical power due to insufficient sample sizes, the presence of environmental or gene-gene interactions, and a lack of genotyping information for particular loci in different studies. For instance, small between-population allele frequency differences at an interacting locus may result in a lack of power to detect the main effect of a genuine association signal in independent cohorts [60]. These reasons can also justify why not all previously discovered AD-associated SNPs were replicated in our study.
Of particular interest was to investigate the sex disparity in the genetic basis of AD. Addressing sex differences in biomedical research has been emphasized by the National Institutes of Health as an approach that can eventually bolster the personalized medicine paradigm [14,15]. Our results revealed a number of new sex-specific genetic contributors to AD at the SNP, gene, and transcriptome levels. For instance, most of the newly detected SNPs, particularly SNPs outside chromosome 19q13, were sex specific as they had significant p values either in males or females and, in addition, their odds ratios were significantly different between the two genders. Interestingly, there were two additional subsets of SNPs that were nominally associated with AD in all datasets in one sex while they were nonsignificant in all datasets in the other. Such consistent sex-specific association signals, although weak, might be important in exploring the differences in genetic risk factors of AD between males and females and may demonstrate genome-wide significance in larger samples. Another level of sex disparity was observed in the gene-based and TWA analyses where several genes were significantly associated with AD in either males or females. Also, there were several pathways that were specifically significant in females. These will be further discussed in the following paragraphs.
In the gene-based analysis, LINC00158, MIR155HG, MIR155, LINC00515, MRPL39, and JAM2 were significantly associated with AD when the entire sample of individuals and/or only females were analyzed. These genes are located near each other on chromosome 21q21.3 in a~332-kb region. The APP gene implicated in early onset familial AD or Down syndrome-related AD [4] is also located 163-449 kb from these genes. There were no AD-associated SNPs with p < 5E-08 within their 1 Mb in current or previous studies [43]. However, there were several SNPs with significant p values at the suggestive level of associations in that chromosomal region among meta-analysis sets of SNPs under plan 1 (i.e., rs76252969 and rs2298369) and plan 3 (i.e., rs12386284, rs1783012, rs1783013, rs926963, rs1893650, rs2226326, rs2829803, rs2298369, rs282 9823, and rs2829832). The SNPs in the 1-Mb upstream/ downstream regions of these genes were previously associated with some potential AD risk factors such as type 2 diabetes, hypertension, coronary artery disease, and lipid profile changes at the genome-wide significance level. They have also been associated with traits such as alcohol and nicotine codependence, age at onset of Parkinson's disease, and pattern recognition memory at the suggestive significance level of association [43]. Furthermore, functional studies have provided insight into the potential roles of some of these genes in AD pathogenesis. For instance, MIR155HG and MIR155 encode two microRNAs. MIR155 overexpression was previously implicated in downregulation of complement factor H (CFH) expression in AD and other neurodegenerative diseases which in turn may prevent spontaneous immune system activation [91]. MRPL39 encodes a mitochondrial ribosomal protein involved in the oxidative-phosphorylation pathway. Impaired mitochondrial function has been reported in neurons of patients with AD [92,93]. Lunnon et al. [92] reported that the expression levels of MRPL39 and another nearby gene (i.e., ATP5J involved in the oxidative-phosphorylation pathway) were slightly reduced in AD patients compared to controls. JAM2 encodes a membrane protein found at the tight junctions of epithelial and endothelial cells that acts as an adhesive ligand for immune cells. It belongs to the immunoglobulin superfamily of adhesive molecules that has been implicated in AD pathogenesis [94]. Also, duplication of an~600-kb region on chromosome 21 containing the JAM2, ATP5J, and APP genes has been reported in autosomal dominant AD [95].
In TWA analyses using brain-specific eQTLs data, four probes/genes were associated with AD (two in males and two in females). Also, using eQTLs data from peripheral blood, the expression level of 17 probes/genes passed both the SMR and the HEIDI tests, indicating that variants influencing the expression of these genes may also have pleiotropic effects on developing AD [55,56]. It should be noted that due to the tissue-specific expression of genes, using data from eQTLs studies on blood is not ideal for capturing associations between the transcriptome levels and AD. However, it increases the power of SMR analysis since such studies take advantage of more samples compared to brain-specific eQTLs studies [55]. Significant SNPs with p < 5E-08 were detected within 1 Mb of MS4A6A and UQCC genes (significant in TWA analyses of blood eQTLs data) in our GWAS or previous reports [43]. SNPs with p < 5E-06 were present only in 1-Mb upstream/downstream regions of ATG10 (significant in brain-specific TWA analyses) in our GWA analyses of females, although several AD-associated SNPs with p < 5E-06 were reported in regions around TRA2A [64], IRAK3 [77], and ESPN [78]. This is likely indicative of the lack of power of conducted GWAS due to insufficient sample sizes [55].
Taken together, all AD-associated genes in our TWA analyses except MS4A6A and UQCC can be considered novel potential AD-associated genes. Further functional analyses are needed to explore their potential roles in AD pathogenesis as detected associations do not imply causation. Instead, they provide a list of prioritized candidates for follow-up studies. SNPs in 1-Mb upstream/ downstream regions around these genes have been previously associated with some other traits (e.g., autoimmune diseases or serum cholesterol levels) with p < 5E-06. Examples include associations of SNPs corresponding to ABCB9 with college completion and years of education, ATG10 with vascular dementia, C9orf72 with amyotrophic lateral sclerosis, frontotemporal lobar degeneration, and response of rheumatoid arthritis patients to anti-TNF treatment, GNAI3 with total and low-density lipoprotein cholesterol (LDL-C) and major depression, LPXN with inflammatory bowel disease, MED30 with rheumatoid arthritis and fasting blood glucose, PRDM10 with type 2 diabetes, and SFN with highdensity lipoprotein cholesterol (HDL-C) [43].
Notably, none of the novel AD-associated genes detected in males were among the significant genes in females and vice versa. Among the significant genes detected in females, a pathologic hexa-nucleotide repeat expansion in the C9orf72 gene has been linked to frontotemporal dementia and may contribute to AD pathogenesis [96][97][98][99]. Also, the GNAI3 gene was reported to be overexpressed in AD intact mice compared to AD impaired ones [100]. CRIPAK, which was among significant genes detected in brain-specific TWA analyses in males, is an inhibitor of the PAK1 gene [101]. The PAK gene family was found to play roles in learning and memory, and the dysregulations were implicated in AD, Huntington disease, and mental retardation [102]. Also, rs1923775 located~700 kb away from CRIPAK has shown relatively strong association (p = 5.60E-6) with AD in African Americans [88].
Of 26 pathways that were significantly associated with AD in our pathway-based analyses, 12 were not plan specific, seven were specifically significant only under plan 1 (males and females), and seven were specifically significant only in females (i.e., plan 3). Pathways that were significant in more than one plan were mostly involved in processes such as mitochondrial function, lipid metabolism, cell junctions, and immune and inflammatory responses that were implicated in AD [93,[103][104][105][106]. There are several lines of evidence in previous empirical studies substantiating the potential roles of some of the detected plan-specific pathways in AD pathogenesis. For instance, it was suggested that deactivation of the epidermal growth factor receptor (EGFR) signaling pathway may attenuate the Aβ-induced memory loss in Drosophila and mice models [107]. Also, the fragmentation and dysfunction of Golgi apparatus, an organelle involved in the posttranslational modifications and trafficking of proteins, has been implicated in AD pathogenesis [108,109]. The upregulation of the Fas signaling pathway, involved in the apoptosis and modulating immune responses, was reported to contribute to the Aβ-induced cell death and neurodegeneration in AD [110,111]. Also, dysregulation of the platelet-derived growth factor (PDGF) signaling pathway was suggested to increase Aβ production and contribute to the neurodegeneration in AD [112,113].
Among the female-specific pathways, G-protein activation is a signal transduction pathway that can modulate the production and action of different intracellular effector proteins. The G protein-coupled receptors play important roles in the initiation and regulation of inflammatory responses such as phagocyte chemotaxis and cytokine production [50,114]. The pathologically increased inflammatory responses were reported in the brain of patients with AD [93]. Gβγ signaling through the PI3Kγ pathway is involved in the regulation of immune system responses and platelet activation [115]. Also, the ADP signaling, signal amplification, and prostacyclin signaling pathways are involved in the regulation of platelets activation in response to injury or in healthy blood vessels [50]. Platelets, as the major sources of amyloid precursor protein (APP) and Aβ in blood, were reported to be overactivated in AD patients possibly due to their stimulation by injured cerebral endothelial cells or by their cell membrane abnormalities [116,117]. The glucagon-type ligand receptors are found in the gastrointestinal epithelium and brain neurons. Glucagon-like peptide-1 (GLP-1) has been suggested as a potential treatment to reverse the neurodegeneration in AD and Parkinson's disease [118,119].

Conclusions
In summary, our study revealed significant associations of several SNPs at genome-wide or suggestive levels of significance which were not reported before. Most of the SNPs that were located outside the APOE cluster gene region were not in LD with previously discovered AD-associated polymorphisms that had p < 5E-06 (Table 2). These SNPs were mapped to 21 genes in 19 chromosomal regions. Of these, 8 genes/regions had not been associated with AD in previous GWAS with p < 5E-06. Also, 26 genes located outside the chromosome 19q13 region, and 26 pathways, showed evidence of associations with AD at the FDR level of 0.05 in our TWA, gene-based, and pathway-based analyses. Thirteen of these 26 genes were located in chromosomal regions with no AD-associated SNPs at the genome-wide or suggestive level of significance. Most of the significantly detected SNPs and genes as well as several AD-associated pathways were sex specific, indicating sex disparities in the genetic basis of AD. By detecting a number of novel potential AD-associated SNPs and discovering suggestive associations of several genes and transcripts, our study provides new insight into the genetic architecture of AD. Particularly, identifying sex-specific genetic contributors can advance our understanding of AD pathogenesis.
Despite the rigor of this study, there are some limitations. The case/control status in the four cohorts used in this study was mainly determined clinically. The routine clinical diagnosis of AD based on the symptoms and neurologic examinations may not provide the optimal case/control classification. Instead, the National Institute on Aging and the Alzheimer's Association suggested that integrating additional paraclinical tests (e.g., histopathologic findings in brain biopsy, measuring AD-related cerebrospinal fluid (CSF) biomarkers, or detecting neurodegeneration by the imaging study) into the diagnostic protocols can aid researchers to more accurately identify AD patients and healthy controls [120,121]. Beach et al. [122] investigated the accuracy of clinical diagnosis of AD by comparing such diagnoses to the histopathology findings from brain autopsies in a sample of 1198 subjects. They found that the sensitivity and specificity of clinical diagnostic classification were 70.9-87.3% and 44.3-70.8%, respectively, indicating a relatively high possibility of clinically false-negative and false-positive classification of subjects as controls and cases, respectively [122]. Finally, since the power of GWA analyses is affected by the sample sizes, and in particular the number of cases, the current study with 2741 cases and 14,739 controls may not have the optimal power. Further studies, possibly with larger sample sizes, are needed to clarify the genotype-phenotype relationships in AD.

Additional files
Additional file 1: Table S1. Cases and controls included. Table S2. QC-passed SNPs analyzed in datasets. Table S3. Genomic inflation factors (λ values) from logistic regression models. Table S4-S6. Replicated set of SNPs detected under analysis plan 1 (males and females), plan 2 (only males), and plan 3 (only females). Table S7-S9. Nonreplicated set of SNPs detected under analysis plan 1 (males and females), plan 2 (only males), and plan 3 (only females). Table S10-S12. Meta-analysis set of SNPs detected under analysis plan 1 (males and females), plan 2 (only males), and plan 3 (only females). Table 13. LD information about newly detected SNPs under plans 1-3 for which proxy AD-associated loci exist in 1-Mb flanking regions [8,9]. Table S14. Coding schema used to determine APOE genotypes. Table 15. Information about LD between APOE SNPs and AD-associated SNPs located on chromosome 19 [8]. Table S16-S17. Wald χ 2 test to compare ORs of SNPs between males and females for SNPs that were specifically significant in males and in females.