Association of MAPT haplotypes with Alzheimer’s disease risk and MAPT brain gene expression levels

Introduction MAPT encodes for tau, the predominant component of neurofibrillary tangles that are neuropathological hallmarks of Alzheimer’s disease (AD). Genetic association of MAPT variants with late-onset AD (LOAD) risk has been inconsistent, although insufficient power and incomplete assessment of MAPT haplotypes may account for this. Methods We examined the association of MAPT haplotypes with LOAD risk in more than 20,000 subjects (n-cases = 9,814, n-controls = 11,550) from Mayo Clinic (n-cases = 2,052, n-controls = 3,406) and the Alzheimer’s Disease Genetics Consortium (ADGC, n-cases = 7,762, n-controls = 8,144). We also assessed associations with brain MAPT gene expression levels measured in the cerebellum (n = 197) and temporal cortex (n = 202) of LOAD subjects. Six single nucleotide polymorphisms (SNPs) which tag MAPT haplotypes with frequencies greater than 1% were evaluated. Results H2-haplotype tagging rs8070723-G allele associated with reduced risk of LOAD (odds ratio, OR = 0.90, 95% confidence interval, CI = 0.85-0.95, p = 5.2E-05) with consistent results in the Mayo (OR = 0.81, p = 7.0E-04) and ADGC (OR = 0.89, p = 1.26E-04) cohorts. rs3785883-A allele was also nominally significantly associated with LOAD risk (OR = 1.06, 95% CI = 1.01-1.13, p = 0.034). Haplotype analysis revealed significant global association with LOAD risk in the combined cohort (p = 0.033), with significant association of the H2 haplotype with reduced risk of LOAD as expected (p = 1.53E-04) and suggestive association with additional haplotypes. MAPT SNPs and haplotypes also associated with brain MAPT levels in the cerebellum and temporal cortex of AD subjects with the strongest associations observed for the H2 haplotype and reduced brain MAPT levels (β = -0.16 to -0.20, p = 1.0E-03 to 3.0E-03). Conclusions These results confirm the previously reported MAPT H2 associations with LOAD risk in two large series, that this haplotype has the strongest effect on brain MAPT expression amongst those tested and identify additional haplotypes with suggestive associations, which require replication in independent series. These biologically congruent results provide compelling evidence to screen the MAPT region for regulatory variants which confer LOAD risk by influencing its brain gene expression.


Introduction
Alzheimer's disease (AD), the most prevalent cause of dementia, is defined by two neuropathological hallmarks: senile plaques primarily composed of extracellular amyloidbeta (Aβ) deposits and intracellular neurofibrillary tangles (NFTs) comprised of hyper-phosphorylated tau protein.
MAPT (micro-tubule associated protein) encodes tau and resides within a~900 kilobase (kb) inversion polymorphism (reviewed [1]) that generates a~1.3 megabase (Mb) region of linkage disequilibrium (LD) defined by two extended haplotypes, referred to as H1 and H2. Variants have evolved that occur on only the H1 haplotype resulting in multiple sub-haplotypes.
Both common and rare genetic variation in MAPT have been strongly implicated in primary tauopathies. Rare missense and exon 10 splicing mutations, which lead to increased levels of tau isoforms with four microtubule binding domains (aka 4-repeat or 4R tau) lead to familial frontotemporal dementia with parkinsonism linked to chromosome 17 (FTDP-17) [2,3], whereas the common MAPT H1 haplotype strongly associates with increased risk of progressive supranuclear palsy (PSP) and corticobasal degeneration (CBD) [4][5][6][7][8]. A recent genome-wide association study (GWAS) of PSP risk identified MAPT as the strongest locus, with risk alleles at rs8070723 which tags the H1 haplotype and also for rs242557, which partially tags the H1c subhaplotype [8].
Despite having tauopathy as a defining lesion, reports of association between AD and genetic variants at the MAPT locus are inconsistent. While MAPT H1 [9] haplotype or H1c subhaplotype [10][11][12][13] showed association with AD risk in some studies, others failed to detect association with H1 [10,13,14], H1c [15] or other MAPT variants [16]. The sample size for most of these studies range from a few hundred to a few thousand; and the largest published study of~17,000 subjects only evaluated the H1/H2 haplotypes but none of the H1-subhaplotypes [9].
In addition to investigations of MAPT variants with risk for tauopathies, some studies also assessed their role in gene expression. MAPT exons 2, 3, 4a, 6 8 and 10 are known to be alternatively spliced [1], there are FTDP-17 splicing mutations which increase 4R tau [2,3] and 4R tau is increased in affected brain regions in PSP and CBD [17,18]. Allele-specific gene expression studies in human brains and neuronal cell lines identified higher levels of exon 10 containing transcripts but not total MAPT associated with the H1-haplotype [19] and higher levels of exon 2-and 3-containing transcripts associated with the H2 haplotype [20]. MAPT H1c-subhaplotype was associated with higher total and 4R MAPT levels in human brains [11]. A study of exon levels in multiple brain regions from humans identified higher expression levels of exon 3 associated with the H2 haplotype, but no association of MAPT levels with the H1c-subhaplotype [21]. We have previously reported association of MAPT H1-tagging and rs242557 SNPs with increased brain MAPT levels in~400 brains from a combined cohort of subjects with AD and other brain pathologies [22]. Collectively, these findings suggest that the disease risk conferred by some MAPT variants could be due to higher total or 4R tau levels and/or that the protective effect of MAPT H2-haplotype might be secondary to an increase in N-terminal exon-containing MAPT transcripts. While these studies are informative, to date, there has not been a systematic and well-powered analysis of MAPT subhaplotypes for association with MAPT brain expression levels.
Herein, we present a comprehensive assessment of MAPT variants that tag all MAPT subhaplotypes of frequency >1% in the largest to date MAPT association study of 9,814 LOAD cases vs. 11,550 controls. Further, we evaluate association of these MAPT variants in two brain regions: the cerebellum, which is predominantly unaffected in AD and the typically affected temporal cortex from~200 autopsied LOAD subjects. Our wellpowered and complementary investigation of disease risk and gene expression provides compelling evidence for a role of transcriptional regulatory variants of MAPT in conferring LOAD risk.

Mayo clinic cohort
We evaluated LOAD risk association with MAPT variants in 2,052 LOAD cases vs. 3,406 controls from Mayo Clinic. These elderly European-American subjects were from two clinical case-control series recruited at the Mayo Clinic in Rochester, MN (RS series: 615 LOAD cases, 2,425 controls) and Jacksonville, FL (JS series: 886 LOAD cases, 981 controls), as well as 551 autopsyconfirmed LOAD subjects from the Brain Bank at Mayo Clinic Florida (Additional file 1: Table S1). All clinical subjects were evaluated by a Mayo Clinic neurologist and autopsied subjects were diagnosed by our neuropathologist (DWD). All clinical LOAD cases had probable or possible AD and all pathologic LOAD cases had definite AD according to NINCDS-ADRDA criteria [23]. All controls had a clinical dementia rating score of 0. All LOAD subjects had an age at disease diagnosis (clinical), death (autopsied) and controls at their most recent visit ≥60 years. A subset of the Mayo Clinic cohort was included in the Mayo LOAD GWAS [24] (Additional file 1: Table S1) and gene expression GWAS (eGWAS) [22] from the temporal cortex (TCX, n = 202) and cerebellum (CER, n = 197). This study was approved by the Mayo Clinic institutional review board and appropriate informed consent was obtained from all individuals.

ADGC cohort
We utilized genetic data and covariate information on the European-American subjects from the Alzheimer's Disease Genetics Consortium (ADGC) cohort. These subjects were collected from multiple research centers and designated into the following 14 [25,26].
The ADGC cohort included subjects from the Mayo Clinic. To avoid any overlap, all subjects from Mayo Clinic were removed from the ADGC cohort. Standard quality control (QC) measures were applied to the ADGC dataset [27] with the following cutoffs, 95% call rate per person, 1% minor allele frequency (MAF) and 95% call rate for SNP, Hardy-Weinberg equilibrium (HWE) p > 1E-06 in controls. Additionally, directly observed (not imputed) SNPs from subjects across all series were evaluated for relatedness by using KING (Kinship-based INference for Gwas)-Robust [27] and a single representative was chosen for each pair of individuals who were third degree relatives or closer. Similarly, one representative was chosen for each family for the MIRAGE and NIA-LOAD family based studies. All cohort genotypes were imputed to a common set of >2 million SNPs (HapMap2) by the ADGC, as described [26]. The 7,762 LOAD cases and 8,144 controls from the ADGC (Additional file 1: Table S1), which remained after the QC, were utilized for the MAPT variant associations.

RNA isolation and gene expression measurements
All samples utilized in the brain gene expression analyses in this study are a subset of the previously published Mayo Clinic expression GWAS (eGWAS) [22]. In the current study brain gene expression levels of autopsied LOAD subjects measured from the cerebellum (n = 197) and temporal cortex (n = 202) were used. RNA extraction and gene expression measurements were previously reported [22]. Briefly, total RNA was isolated from frozen postmortem brain tissue using the Ambion RNAqueous kit according to the manufacturer's instructions. The quantity and quality of the RNA were evaluated using the Agilent 2100 Bioanalyzer and RNA 6000 Nano Chip.
Whole Genome DASL assay (WG-DASL, Illumina, San Diego, CA) was used to measure transcript levels. This platform is designed for gene expression measurements for partially degraded RNA such as is typically isolated from frozen human brains. Details of gene expression measurements, data processing and QC were already published [22]. Briefly, 15 replicate samples measured on 5-6 different plates and on 2-3 different days were included in the study for QC and also for intra-class coefficient (ICC) [28] estimations. Raw probe level mRNA expression data were exported from GenomeStudio software (Illumina Inc.) for preprocessing with background correction, variance stabilizing transformation, quantile normalization and probe filtering using the lumi package of BioConductor [29,30]. Probes with detectable signal in >75% of the samples were used in subsequent analyses. We also annotated all of the probes by comparing their positions according to NCBI Ref Seq, Build 36.3 to those of all variants within dbSNP131 and identified the list of probes which have ≥1 variants within their sequence.

Genotyping
Six MAPT locus haplotype tagging (ht) SNPs were selected for genotyping in the Mayo Clinic cohort (Additional file 1: Figure S1, Tables S1 and S2). SNP rs8070723 was used as a proxy for the H1/H2 haplotypes defining del-In9. The remaining 5 SNPs have been previously described to tag the majority of H1 sub-haplotypes [6]. Genotypes for three SNPs (rs1467967, rs242557 and rs8070723) for a subset of the samples were obtained from the Mayo Clinic LOAD GWAS (Additional file 1: Table S1). The remaining genotypes for these and all genotypes for an additional three SNPs (rs3785883, rs2471738 and rs7521) were obtained using Applied Biosystems® Taqman genotyping assays. The genotypes for these six SNPs were extracted from the ADGC GWAS data [27] using PLINK [31].

Statistical analysis MAPT single SNP association analysis with LOAD risk
All six htSNPs were tested for association with disease risk in the combined Mayo Clinic cohort, as well as individually in the JS and RS series. The same SNPs were also tested in the ADGC cohort, as well as in the ADGC + Mayo combined cohorts. All SNPs were tested for deviations from Hardy-Weinberg equilibrium (HWE) [31] in controls.
Single SNP associations with disease risk were tested assuming an additive model, using multivariable logistic regression implemented in PLINK [31] including the following covariates: Age (defined for Mayo Clinic cohort as age at diagnosis/death/last diagnosis for clinical LOAD/autopsied LOAD/controls), sex, APOE ε4 dosage and series. The analyses in the ADGC-only cohort included these covariates and also ten principal components obtained from EIGENSTRAT [32]. Mayo Clinic-only and Mayo + ADGC analyses did not include principal components, as they were not available for many of the Mayo subjects.
MAPT haplotype association analysis with LOAD risk PLINK was used to estimate haplotype frequencies using the sliding window specification with a window size of six to encompass all six of the htSNPs. Haplotype associations with LOAD risk in the Mayo Clinic series were performed with both PLINK and haplo.score [33], which revealed identical results for the single haplotype analyses. According to the score statistic approach, all possible haplotypes consistent with the observed marker genotypes are obtained, maximum likelihood estimates of the haplotype frequencies, as well as the posterior probabilities of the pairs of haplotypes for each subject are computed. These posterior probabilities are then used to compute the score statistics for the association of (ambiguous) haplotypes with LOAD risk using multivariable logistic regression analysis with inclusion of the same covariates as discussed above. Only those haplotypes with frequencies >1% in the cohorts that they were tested in were included in the association analyses.

MAPT variant association analysis with gene expression levels
Each of the MAPT htSNPs and the estimated haplotypes were also tested for association with gene expression levels of MAPT in the TCX and CER of LOAD subjects, as measured using three probes: ILMN_1710903, ILMN_2310814 and ILMN_2298727. These LOAD subjects were also participants in our previously published eGWAS [22]. Association analysis was carried out in PLINK using linear regression approach, whereby preprocessed probe transcript levels for the three probes in each brain region (TCX and CER) were assessed as six individual quantitative phenotypes. Covariates included in the models were age at death, sex, APOE ε4 dosage, PCR plate, RNA integrity number (RIN) and adjusted RIN 2 , as described previously [22,34]. Only those haplotypes with frequencies >1% in the autopsy series that they were tested in were included in the association analyses.

Association of MAPT single SNPs with LOAD risk
Six MAPT htSNPs were tested for association with LOAD risk in the Mayo Clinic and ADGC cohorts both individually and combined (Table 1). All SNPs had genotyping call rates ≥90% in the Mayo Cohort (~90-97%),~83-100% in the ADGC cohort and~85-100% in the combined cohort (Additional file 1: Table S2). MAPT rs242557 had the lowest call rate of 83% in the ADGC cohort, with all other SNPs having call rates of ≥89%. All SNPs passed the HWE cutoff of p > 1E-06 in controls, although rs242557 had HWE p < 0.05 in the Mayo Clinic, but not the ADGC controls.
There was highly significant association of H2-tagging rs8070723-G allele with reduced risk of LOAD in the Mayo Clinic cohort (odds ratio = OR = 0.81, p = 7.0E-4) with remarkably similar OR estimates in the JS and RS series (Additional file 1: Table S3) and in the independent ADGC cohort (OR = 0.89, p = 1.3E-4) ( Table 1). The association in the combined Mayo + ADGC cohort for this variant was highly significant (OR = 0.90, p = 5.3E-5) and would withstand Bonferroni correction for the six tested variants but not achieve significance at a genome-wide level.

Association of MAPT haplotypes with LOAD risk
In the Mayo Clinic cohort of~5,000 subjects, we identified 19 MAPT haplotypes with a frequency >1%. In this cohort, rs8070723-G allele tagged the H2 haplotype, present in 21.5% of the subjects, perfectly. Eighteen subhaplotypes were identified on the H1 background. Three MAPT haplotypes were nominally significantly associated with LOAD risk (Table 2) and a global test for haplotypic association was also significant (p = 0.012).
As expected, the MAPT H2 haplotype was significantly associated with decreased risk for LOAD in the Mayo Clinic cohort (OR = 0.80, p = 4.1E-04). Additionally the most common sub-haplotype on the H1 background, H1b (frequency = 17.3%), was nominally significantly associated with increased risk for LOAD (OR = 1.15, p = 0.046); as was a less frequent H1 sub-haplotype J (frequency = 1.2%, OR = 1.88, p = 0.031), while three other H1 sub-haplotypes were marginally associated, also with increased LOAD risk (L, X and Y).
In the ADGC cohort, the MAPT H2 haplotype, was present in 22% of the subjects. On the H1 background, 19 sub-haplotypes were identified with a frequency of ≥1%. As with the Mayo Clinic cohort, H2 haplotype was significantly associated with reduced risk of LOAD in the ADGC cohort (OR = 0.90, p = 6.29E-04). None of the H1-subhaplotypes had significant association with LOAD risk in this cohort.  Results of multivariable logistic regression analyses for MAPT haplotypes with frequencies >1% are shown. Haplotype nomenclature is assigned as previously reported [6,35]. Alleles for the SNPs defining the haplotypes are given in the 5' to 3' order as follows: rs1467967, rs242557, rs3785883, rs2471738, rs8070723, rs7521. Haplotypes not previously observed are designated by an asterisk (*). F_All = haplotype frequency in all subjects; F_A = in affected (LOAD) and F_U = unaffected (Control) subjects. OR = Odds Ratio, P = p-value. Boldface values within the tables indicate significant or suggestive associations with a p-value <0.10.
In the combined Mayo + ADGC cohort, there was significant global haplotypic association (p = 0.033). MAPT H2 haplotype had highly significant association with reduced risk of LOAD (OR = 0.90, p = 1.53E-04). MAPT J subhaplotype had nominally significant association with LOAD risk in the combined cohort (OR = 1.32, p = 0.049) with suggestive association observed for H1b and increased LOAD risk (OR = 1.05, p = 0.089) and for H1d and reduced LOAD risk (OR = 0.91, p = 0.074). H1c subhaplotype did not achieve significance in the Mayo Clinic, ADGC or Mayo + ADGC cohorts.

Association of MAPT single SNPs and haplotypes with gene expression levels
In our published eGWAS [22], there were three probes on the WG-DASL platform that were used to measure MAPT levels: ILMN_1710903 and ILMN_2310814 that anneal to different regions of the MAPT 3'UTR and ILMN_2298727 that anneals to Exon 4a (Additional file 1: Figure S1). Given that the inclusion of exon 4a in tau transcripts in the central nervous system was not reported previously, we generated a quantitative PCR assay against this exon, and were able to successfully measure it in the human brain (data not shown). All three probes passed our QC threshold of detectability in >75% of subjects, with ILMN_1710903 and ILMN_2310814 detected in 100% of all AD brains tested in both the cerebellum (CER) and temporal cortex (TCX) and with ILMN_2298727 detectable in 98.0% of AD CER and 83.7% of AD TCX tissue. We previously estimated intraclass coefficients [28] for all gene expression probes, which represent the percentage of variance in expression between samples over total variance and which reflect the genetic component that contributes to variability in gene expression. We determined that both ILMN_2298727 and ILMN_1710903 had high ICC estimates of 87%, whereas ILMN_2310814 had a low ICC estimate of 18%. The variances of gene expression estimated from all subjects in our eGWAS of cerebellar tissue (n = 374) [22] revealed consistent findings for these three MAPT probes, with both ILMN_2298727 (0.24) and ILMN_1710903 (0.12) having variance estimates that are~an order of magnitude greater than that of ILMN_2310814 (0.03). We thus conclude that ILMN_2310814 is unlikely to be an informative probe.
We previously annotated all our probes for variants in their sequence [22], given the concern that such variants may result in differential binding of probes with artifactual variance in the expression levels, and therefore could result in false positive associations with genetic variants in LD with probe sequence variants [36,37]. Our annotation detected two variants within the probe sequence of ILMN_1710903 (rs67759530, rs66561280) that were also polymorphic in our autopsied AD series. ILMN_2310814 did not have any variants within its probe sequence. ILMN_2298727 annotation identified rs73314997 within its sequence, although this variant was essentially monomorphic in our eGWAS subjects [22]. Thus, of the three MAPT probes assessed in our gene expression analyses, ILMN_2310814 is unlikely to be informative and ILMN_1710903 may be prone to artifactual results. We therefore focused on ILMN_2298727 in our MAPT expression analyses (Tables 3 and 4), although we show results from all 3 MAPT probes for completeness.
Evaluation of the six MAPT SNPs revealed significant associations between ILMN_2298727 and rs1467967, rs242557, rs8070723 and rs7521. The MAPT H2 haplotype tagging rs8070723 was associated with lower MAPT levels in both CER (β = −0.16, p = 0.002) and TCX (β = −0.20, p = 4.9E-04) of LOAD subjects (Table 3), as we previously reported in this cohort [22]. The other significant variants were associated with higher MAPT levels in both brain regions. Interestingly, the same variants showed associations in the same direction with the ILMN_1710903 probe, although with higher levels of significance.

Discussion
In this largest to date evaluation of haplotypic variation at the MAPT locus in 9,814 LOAD cases and 11,550 controls, we find robust and replicable association of the MAPT H2 haplotype with reduced risk of LOAD or, equivalently, increased risk of LOAD with the MAPT H1 haplotype-in two independent cohorts from Mayo Clinic and ADGC, with similar effect size estimates. Most prior reports of haplotypic association identified LOAD risk conferred by MAPT H1c subhaplotype [10][11][12], which we were unable to replicate. One group identified an association between the MAPT H1 haplotype and an increased risk for amnestic mild cognitive impairment [38], which can be a prodrome to clinical AD. The only other study to evaluate MAPT in a large cohort (3,940 cases and 13,373 controls) also identified an association between the H2 haplotype and decreased LOAD risk [9]. In that study by Gerrish et al. [9] the H2-haplotype tagging SNP had an OR estimate of 0.89 (p = 5.20E-04), which is remarkably similar to the estimate of the H2-tagging SNP (OR = 0.90, p = 5.3E-05) and H2 haplotype (OR = 0.90, p = 1.53E-04) in our study. It should be noted that both the present study and Gerrish et al. included samples from the ADNI and TGen series. We confirmed that the MAPT H2 association retains its significance in the ADGC cohort even after removal of these two datasets (OR = 0.87, p = 6.1E-04). Thus, there is evidence of MAPT H2 association with reduced risk of LOAD in two large and independent studies. Though robust, this LOAD risk association does not achieve genome wide significance in either study or a p value < 1.0E-7 in the recent meta-analysis of 74,046 individuals by the IGAP consortium [39]. It will be important to evaluate the IGAP dataset for availability of MAPT haplotype tagging variants and to pursue an indepth analysis of haplotypic association at this locus.
Although the MAPT H2 haplotypic association with LOAD was clearly the strongest of the MAPT haplotypes and one that we previously reported [22], we identified additional SNPs and haplotypes with nominal significance in our study. These weaker associations would not withstand multiple testing and could represent false positives and require replication in additional series. It should be noted that some of these variants, such as rs3785883, H1b, H1d and J showed consistent direction of effect in the Mayo Clinic and ADGC cohorts. MAPT rs3785883 minor allele was previously shown to associate with higher levels of CSF tau, phospho-tau and earlier age at onset [16]. Although this prior smaller study did not identify association with LOAD risk, the biological effect of this variant which associates with increased LOAD risk in our study appears to be consistent between these two studies.
We and others previously reported association between MAPT haplotypes and brain MAPT levels [11,[19][20][21][22]. In this study, we evaluated MAPT subhaplotypes for association with brain MAPT levels in two brain regions from LOAD subjects. The most robust gene expression association occurs with the H2 haplotype, as we had reported [22] (β CER = −0.16, p CER = 0.003; β TCX = −0.20, p TCX = 0.001), that also has the strongest association with LOAD risk in our study. We find that this haplotype with a protective effect on LOAD associates with lower brain MAPT levels. Given multiple MAPT alternatively spliced exons leading to multiple transcripts, each with potentially different effects on function [1,20], uncovering the precise regulatory change associated with genotypic variation in this region is critical. In our study, we mainly focus on results from one probe, ILMN_2298727, that is both informative and does not have a variant in its sequence based on annotation and genotyping. This probe is expected to anneal to exon 4a, however the expression levels obtained from it can be a surrogate for total MAPT levels or levels of any of the alternatively spliced exon-containing transcripts that reside with exon 4a. Indeed, gene expression associations with this probe are consistent with those from ILMN_1710903, which should recognize all transcripts, although ILMN_1710903 is confounded by a confirmed variant within its sequence. Our findings are also congruous with prior reports of associations of H1 haplotype or H1c sub-haplotype with higher 4R [19] and or total MAPT [11] levels, as measured by alternative gene expression measurement methods. We did not identify significant associations between the H1c subhaplotype and brain MAPT levels, though we did observe suggestive associations between both CER and TCX MAPT levels and rs242557, a variant that partially tags H1c. MAPT rs1467967 associated with significant MAPT elevations in both brain regions and a suggestive association with LOAD risk, which is biologically consistent. These and additional weaker gene expression associations with variants such as H1b, I, L and rs7521 requires further replications.
In summary, our study provides evidence of robust LOAD risk and brain MAPT level associations with MAPT H2 haplotype and nominates additional variants and subhaplotypes for further investigations in LOAD. The overall genetic contribution of MAPT variants to LOAD risk appears to be modest, in contrast to primary tauopathies, where the H1 haplotype, for example, has an estimated OR of 5.5 from the PSP GWAS [8]. This may be due to different sets of functional variants residing in the same haplotypic backbone and leading to different biological outcomes resulting either in a primary tauopathy vs. tau pathology in LOAD; a more complex genetic architecture in LOAD with contribution from multiple functional variants in different pathways; or a combination of both. Discovering the precise sets of MAPT functional variants; and assessing their biologic consequence, especially on transcriptional regulation, may be critical to deciphering the commonalities and distinctions in the etiology of LOAD vs. primary tauopathies. Our study highlights the importance of in-depth association of MAPT haplotypic variation in well-powered cohorts and nominates H2 and additional variants as LOAD risk factors with effects on gene expression. Larger scale MAPT haplotype LOAD risk association studies, variant discovery efforts targeting specific haplotypes and transcriptional studies that jointly evaluate haplotypes and specific transcripts are warranted.

Conclusions
In summary, these findings confirm associations between MAPT H2 haplotype and both reduced risk of LOAD and lower MAPT transcript brain levels. In addition, we describe additional MAPT variants and subhaplotypes that associate with LOAD risk and/or brain MAPT levels, which require confirmation in additional series. These results highlight the importance of joint utilization of gene expression and disease risk phenotypes. Additionally, these biologically consistent findings should encourage screening efforts in the MAPT region for discovery of regulatory variants that confer LOAD risk via influencing brain levels of MAPT transcripts.

Additional file
Additional file 1: This file includes Table S1. (Demographic information of the cohorts); Table S2. (Genotype counts, call rates and Hardy Weinberg results); Table S3. (MAPT single SNP association results with LOAD risk in the individual Mayo Clinic series. Results of multivariable logistic regression analysis); Figure S1. (MAPT Refseq mRNA isoforms and SNP annotation).