Association study between drug prescriptions and Alzheimer’s disease claims in a commercial insurance database
Alzheimer's Research & Therapy volume 15, Article number: 118 (2023)
In the ongoing effort to discover treatments for Alzheimer’s disease (AD), there has been considerable focus on investigating the use of repurposed drug candidates. Mining of electronic health record data has the potential to identify novel correlated effects between commonly used drugs and AD. In this study, claims from members with commercial health insurance coverage were analyzed to determine the correlation between the use of various drugs on AD incidence and claim frequency. We found that, within the insured population, several medications for psychotic and mental illnesses were associated with higher disease incidence and frequency, while, to a lesser extent, antibiotics and anti-inflammatory drugs were associated with lower AD incidence rates. The observations thus provide a general overview of the prescription and claim relationships between various drug types and Alzheimer’s disease, with insights into which drugs have possible implications on resulting AD diagnosis.
Among various forms of dementia, Alzheimer’s disease is considered a particularly debilitating neurodegenerative disease that has had a significant social impact . Development of treatments for AD, drug or otherwise, is coming along but has been slow , in large part due to the limited understanding of the pathophysiology of the disease. While there are clear indicators of AD pathology such as amyloid plaque buildup and neurofibrillary tangles, the functional roles of such disease indicators and their contribution to the pathology are still unclear . As an alternative to de novo drug development, there have more recently been considerations to investigate previously approved drug candidates that can potentially be repurposed for the treatment or prevention of AD [5, 9, 13, 17, 25], taking advantage of the reduced time and cost associated with drug repurposing. Over 50 repurposed drugs are already being tested in clinical trials , and constantly more potential candidates are emerging from ongoing research.
Medical records, such as insurance and health records, can be likened to a treasure trove of clinical data, with the capability to provide statistical insights on drug use, disease incidence, medical costs, patient demographics, admission rates, and more [14, 26]. The plethora and variety of data can be joined together to provide a correlative analysis which, while perhaps not conclusive, can provide some directions on potential avenues of research. One such application is considering drug-disease relationships within patients for the purpose of identifying or supporting drug-repurposing targets [9, 33]. In this study, we investigate a compilation of insurance records from a US commercial insurance group, with coverage of the group’s members dating back to 2012. The database thus far has been used in analyses such as drug cost comparisons between hospitals and physician offices , cost-saving differences between cancer screening methods , trends in drug prices , opioid prescription trends during the opioid crisis , and relationships between COVID and pre-existing conditions . Thus far, the database has yet to be used for observing correlative drug-disease relationships.
In this study, we report the correlations observed between Alzheimer’s disease and drug prescriptions from the health records of the insurance database. Insurance claims data offers the ability to mine for associations in massive databases spanning millions of individuals. Here, we measured drug associations with two disease metrics: (1) AD incidence rate and time to diagnosis and (2) frequency of AD claims as a proxy for disease severity. The analyses conducted here may reveal previously unidentified trends between drug prescriptions and AD diagnosis within a commercially insured population. It may also serve as a reference to those investigating potential repurposing candidates and/or provide additional support for previously identified candidates for the treatment and/or prevention of Alzheimer’s disease.
Methods and Materials
All member and claim data for the study were accessed through Blue Cross Blue Shield Axis ®, the largest collection of secure commercial claims, medical professional, and cost of care information. The limited dataset of claims data is derived from the independent, locally operated Blue Cross Blue Shield companies across the USA. The database comprises over 400 million claims compiled from over 9 years worth of data (approximately from 2012 to 2021). The derived records are considered primary data sources that include member demographics, claims made during hospital or doctor visits, and pharmaceutical claims for drug prescriptions.
For the purposes of this study, we defined the incidence of a disease based on the ICD-9 code  listed as the primary diagnosis within each medical claim. For AD, we used the ICD-9 code 331.0. Prescription drug information within the database was defined according to the National Drug Code (NDC) format. We mapped the NDC IDs to their active ingredients via RxNorm . Thus in this study, a drug was defined as one of the RxNorm (active ingredient) codes mapped in this way. Drug users were defined as members that made at least one drug prescription claim mapped in this way, while non-users were members that never made a claim for the drug in their available coverage history.
Two primary outcomes were considered: (1) AD incidence rate and time to diagnosis and (2) healthcare utilization related to the disease represented by the number of AD claims. Both analyses were done on a per-drug basis. In summary, for each drug evaluated, members were grouped into users and non-users (Supplementary Methods Figure S1, S3); drug users were propensity matched to those in the non-user group based on age, gender, unique drug usage, and presence of common diseases (via ICD9) (Supplementary Methods Figure S2); and statistical associations between drug use and AD outcomes were conducted. The 200 most common drugs taken by members in the AD group were chosen for the two analyses; additionally, repurposing candidates that were in clinical trials as of 2020  were separately analyzed for AD incidence rate. An overview of the incidence and claims count analyses are summarized in Fig. 1. More detailed explanations for each analysis are provided in the Supplementary Methods.
Inflation of test statistics was controlled using a median quantile adjustment similar to genomic control  for the primary analyses. This was achieved by determining the inflation factor (the median observed test statistic relative to theoretical value), scaling test statistics with the inflation factor, and recalculating p-values. Afterwards, multiple testing correction was conducted by using the Benjamini/Hochberg method .
All analyses were conducted within a virtual, computational environment managed by the Blue Cross Blue Shield Association to ensure proper privacy protection of the sensitive patient data. All modeling and statistical measurements were performed using the statsmodels package within python version 0.10.0 . The IRB protocol number for this project is IRB-19–7372.
General statistics and inclusion criteria
The entirety of claims data available spans from 2012 up until the end of 2020 and contains approximately: 113 million distinct members, 751 million inpatient medical claims, 5.2 billion outpatient medical claims, and 3.9 billion pharmaceutical-related claims. There were a total of 143,761 members that had made at least one claim for Alzheimer’s disease (ICD9 code 330.0). Within these members, 92,323 were female and 51,438 were male, and the mean and median age were 86 and 88, respectively. The filters we used in the incidence analysis are described in the following. First is our consideration of age. Alzheimer’s disease disproportionately affects the elderly population that is on average 65 years or older . To improve efficiency in matching and reduce bias from having a large population of younger individuals, an over-encompassing filter was applied where only members that were of age 65 or older at the midpoint of the coverage range (2016) were considered (IE over 60 at 2012, or over 70 at 2021).
Next, we address our decision to use only the individuals with full coverage history. Our concern is the increased matching of pairs with insufficient disease profiles due to a lack of data from a short coverage period. For example, individuals with only a few months of coverage would be matched simply because they made few to no claims during that short time period. It was in our interest to reduce such variability in the matches as much as possible; as a result, we opted to include a filter for the full coverage range of 9 years.
In summary, we only included members that had (1) full coverage history within the database (2012 to 2020), (2) BCBS as their primary provider, and (3) at age 70 + in 2021. These filters limited the number of members with AD to 835. These filters were also applied to members without AD for the analysis and resulted in 101,084 non-AD members.
Within the claim count analysis, the conditions were relaxed since it was already known that all members have made at least one claim for AD. Instead, we opted to find a balance between members having a sufficient amount of coverage information prior to and after AD diagnosis, without reducing the pool of members too much. As a result, we chose members that had at least 9 months of coverage both before and after their AD diagnosis, in which 74,153 members (approximately half of the entire AD cohort) were eligible.
Alzheimer’s disease incidence rate and survival analysis depending on drug use
Survival analysis was conducted on propensity-matched pairs for the top 200 drugs taken by AD patients based on the BCBS database records. The log hazard ratio distribution and the hazard ratio log cumulative graph are shown in Fig. 2. Figure 3 displays the QQ plots for raw and adjusted p-values from the survival analysis on the negative log 10 scale. The median hazard ratio for all drugs was 0.95. The inflation factor for the adjusted p value was approximately 2.46. We identified 22 drugs having an association with AD incidence with adjusted p < 0.05 in our survival analysis, 15 with a decreased risk, and 7 with an increased risk (Table 1).
Out of 58 repurposed drug candidates that are in clinical trials for the treatment of AD as of 2020 , we found 25 drugs where (1) prescriptions for the drug were present with members that had made claims for AD within the BCBS database, providing evidence of use within the member cohort, and (2) drug users were able to be matched to nonusers during propensity matching. The specific candidates were separately compared from the general analyses of the 200 drugs. P-value adjustment and correction were not performed due to the lower number of candidates. Table 2 shows the most significant candidates (p < 0.05) that were present in the drug list, as well as their survival analysis statistics. A list of all the drugs that were tested can be found in the Supplementary tables. Overall, only 8 of the repurposed drug candidates were shown to have any significant association with AD incidence within the BCBS database.
Healthcare utilization related to AD as represented through claim count analysis within the AD-specific population
We next examined the association between drug prescription claims and the number of AD claims. According to a number of studies, insurance claim count data can be considered a proxy for disease severity [16, 20, 28]. Although differences in AD-related claim counts between drug users and non-users may simply reflect differences in health care utilization, we also believe that strong associations may indicate a drug’s potential influence on the progression of AD.
Propensity-matched drug users and non-users were compared for the top 200 drugs taken within members that had made at least one claim for Alzheimer’s disease. The total number of Alzheimer’s disease-related claims made within the 9 months after the first claim was tallied for the member pairs, and the paired t-test statistic and p value significance were calculated. Figure 4 shows the QQ plots of p value significance (raw, adjusted, and corrected) for the 200 drugs in the analysis. Table 3 portrays the drugs considered significant based on adjusted p-value (adjusted p < 0.05), indicating that drug users and non-drug users have a non-trivial difference in the number of AD claims made. Overall, this consists of 9 negative and 4 positive paired t-test statistic results that were deemed significant. All p-values calculated are provided in the table for reference; only one drug (quetiapine) was considered significant when observing the corrected p-value.
The current study is an exploratory analysis that identifies associations between drug treatment and Alzheimer’s disease in a large insurance claims database. We found that antibiotics, antiviral, and anti-inflammatory drugs had low hazard ratios in our study. Notably, very significant drugs with lower hazard ratios consist of common antibiotics and anti-inflammatory drugs, particularly for those where the corrected P is less than 0.05. A number of ongoing studies have suggested that inflammatory response in the brain is one of the key factors that lead to AD [1, 21]. The observation that taking anti-inflammatory medication results in a lower incidence of AD can be supportive of this association. Likewise, microbial and viral infections have also been associated with a higher risk of AD [4, 13]. The lower hazard ratio from the use of antibiotics and antiviral medication, therefore, could be indicative of a possible protective effect leading to reduced incidence of AD overall. Overall, the results may warrant further investigation into these drug categories to more clearly elucidate if they can have a preventive influence for Alzheimer’s disease.
Our analysis also shows drugs that treat mental illness have high hazard ratios and more claims for drug users. The significant drugs seen with high hazard ratios are those used to treat AD itself or other mental illnesses. This is also seen with the repurposed candidates in clinical trials, where the higher HRs trend towards drugs that are typically used to treat neurological illnesses, as well as the claim counts, in which the significant drugs with a positive paired t-test statistic are also related to the treatment of mental illnesses. It is key to note that these trends are unlikely due to the effect of the drug use, but rather that the diseases themselves were not considered in the feature selection during propensity matching. It is particularly notable that the drugs donepezil and memantine, which are primarily used in the treatment for AD, are prescribed prior to the diagnosis of AD within the drug user group. One likely reason for this case is that these drugs are being prescribed for MCI and other forms of cognitive decline prior to the onset and eventual diagnosis of Alzheimer’s disease itself, although studies have shown that the efficacy of such drugs is modest at best [29, 34]. Other medications provide more intriguing insights into the connections between AD and a variety of other mental illnesses such as dementia, schizophrenia, anxiety, and depression. Relationships between these conditions and Alzheimer’s have been seen before, where a number of these conditions result in a higher risk of AD [8, 15, 18]. Furthermore, misdiagnosis for the early stages of AD as a different disorder may be a cause of the associations seen here,misdiagnosis is prevalent for Alzheimer’s disease , which could be reflected in the candidates with higher hazard ratios seen in the study.
There are a number of important limitations to this study. First, the misdiagnosis and underdiagnosis of AD could affect the analysis results. Our analysis relied on a single ICD-9 code (331.0) to define AD-positive individuals. While including ICD-9 codes for other dementias could reduce the effect of underdiagnosis, our analysis conservatively defined AD positivity to avoid reporting biased results.
Second, our propensity score model had a finite limit on the number of patient features that could be used in our matching procedure. The model was primarily based on demographic data and individual disease status (also derived from the insurance claims data), which captured a broad cross-section of features for propensity matching. While the addition of more features could undoubtedly improve matching, our analysis was limited by practical considerations of compute requirements and generalizability.
Next, the filters that were applied for the incidence analysis merit discussion. Most notably, full coverage filtering introduces the possibility of survival bias, where only individuals that have “survived” (in this case, still insured by the current group). However, using partial instead of full coverage filters resulted in high inflation of the test statistic and overall predisposition of drugs towards lower hazard ratios (see all Supplementary Results figures). Hence, for both consistency and reliability, our analysis considers individuals with the full coverage range, which reduces the number of poorly matched pairs with incomplete coverage data. Consequently, the average life expectancy after AD diagnosis is approximately 8.3 years ; therefore, it is reassuring that the range observed would expect to cover many of the patients that end up being diagnosed within the specified time frame and likely reduce the influence of survival bias.
Finally, there are limitations originating directly from the dataset itself. These include, but are not limited to, (1) prevalence of younger and working demographic population for commercially insured members may bias the cohort to a more healthy deposition; (2) healthcare utilization could simply be a result in better use of the healthcare system rather than severity; and (3) drug prescriptions do not guarantee the actual use of the drug by the patient, or vice versa where the patient may use a particular drug without making a prescription claim.
In conclusion, the observational study described by this manuscript can be considered as a compilation of the associations between the use of commonly prescribed drugs and Alzheimer’s disease within an insured population. Most notable is how observations made in this study can be confirmed with other experimental studies. While the mechanisms behind neuroinflammation leading to AD had been established in experimental settings and are still an active field of study, and the comorbidities of AD with other mental illnesses had been observed before in clinical settings, it is interesting to observe that the trends carry over in an observational study of a commercially insured population. In both cases, where antibiotics were associated with a lower incidence of AD and mental illness drugs associated with a higher incidence of AD, it was both unexpected (assuming the null hypothesis) and reassuring that there is mechanistic reasoning behind the results. In future studies, we hope to further explore these drug-disease associations through secondary datasets and mechanistic validation. Overall, the results from the analysis are provided with the hopes of providing direction and furthering progress on the complex task of understanding, treating, and preventing Alzheimer’s disease.
Availability of data and materials
Supporting Data (Results Tables) will be provided as Supplementary Material.
Akiyama H, Barger S, Barnum S, Bradt B, Bauer J, Cole GM, Cooper NR, et al. Inflammation and Alzheimer’s disease. Neurobiol Aging. 2000;21(3):383–421.
Bauzon, Justin, Garam Lee, and Jeffrey Cummings. 2020. “Repurposed agents in the Alzheimer’s disease drug development pipeline.” Alzheimer’s Research & Therapy 12 (1). https://doi.org/10.1186/s13195-020-00662-x.
Brookmeyer R, Corrada MM, Curriero FC, Kawas C. Survival following a diagnosis of Alzheimer disease. Arch Neurol. 2002;59(11):1764–7.
Brothers HM, Gosztyla ML, Robinson SR. The physiological roles of amyloid-β peptide hint at new ways to treat Alzheimer’s disease. Front Aging Neuroscie. 2018;10(April):118.
Cardoso S, Moreira PI. Antidiabetic drugs for Alzheimer’s and Parkinson’s diseases: repurposing insulin, metformin, and thiazolidinediones. Int Rev Neurobiol. 2020;155(August):37–64.
DeTure MA, Dickson DW. The neuropathological diagnosis of Alzheimer’s disease. Mol Neurodegener. 2019;14(1):32.
Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55(4):997–1004.
Dietlin S, Soto M, Kiyasova V, Pueyo M, de Mauleon A, Delrieu J, Ousset PJ, Vellas B. Neuropsychiatric symptoms and risk of progression to Alzheimer’s disease among mild cognitive impairment subjects. J Alzheimer’s Dis. 2019;70(1):25–34.
Fang J, Zhang P, Zhou Y, Chiang C-W, Tan J, Hou Y, Stauffer S, et al. Endophenotype-based in silico network medicine discovery combined with insurance record data mining identifies sildenafil as a candidate drug for Alzheimer’s disease. Nature Aging. 2021;1(12):1175–88.
Gauthier, S., P. Rosa-Neto, J. A. Morais, and C. Webster. 2021. “World Alzheimer Report 2021: Journey through the Diagnosis of Dementia.” Alzheimer’s Disease International.
Hochberg Y, Benjamini Y. More powerful procedures for multiple significance testing. Stat Med. 1990. https://doi.org/10.1002/sim.4780090710.
Hung S-Y, Wen-Mei Fu. Drug candidates in clinical trials for Alzheimer’s disease. J Biomed Sci. 2017. https://doi.org/10.1186/s12929-017-0355-7.
Iqbal UH, Zeng E, Pasinetti GM. The use of antimicrobial and antiviral drugs in Alzheimer’s disease. Int J Mol Sci. 2020. https://doi.org/10.3390/ijms21144920.
Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 2012;13(6):395–405.
Johansson M, Stomrud E, Johansson PM, Svenningsson A, Palmqvist S, Janelidze S, van Westen D, Mattsson-Carlgren N, Hansson O. Development of apathy, anxiety, and depression in cognitively unimpaired older adults: effects of Alzheimer’s disease pathology and cognitive decline. Biol Psychiatry. 2022. https://doi.org/10.1016/j.biopsych.2022.01.012.
Joo MJ, Lee TA, Bartle B, van de Graaff WB, Weiss KB. Patterns of healthcare utilization by COPD severity: a pilot study. Lung. 2008;186(5):307–12.
Law CS, Wei, and Keng Yoon Yeong. Repurposing antihypertensive drugs for the management of Alzheimer’s disease. Curr Med Chem. 2021;28(9):1716–30.
Maxwell, Colleen J., Laura C. Maclagan, Daniel A. Harris, Xuesong Wang, Jun Guan, Ruth Ann Marrie, David B. Hogan, et al. 2022. “Incidence of neurological and psychiatric comorbidity over time: a population-based cohort study in Ontario, Canada.” Age and Ageing 51 (2). https://doi.org/10.1093/ageing/afab277.
Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc. 2011;18(4):441–8.
Nguyen MH, Burak Ozbay A, Liou I, Meyer N, Gordon SC, Dusheiko G, Lim JK. Healthcare resource utilization and costs by disease severity in an insured national sample of US patients with chronic hepatitis B. J Hepatol. 2019;70(1):24–32.
Ozben T, Ozben S. Neuro-inflammation and anti-inflammatory treatment options for Alzheimer’s disease. Clin Biochem. 2019;72(October):87–9.
Patterson, Christina. 2018. “World Alzheimer Report 2018.” Alzheimer’s Disease International, September. https://apo.org.au/node/260056.
Richman IB, Long JB, Kunst N, Kyanko K, Xiao Xu, Busch S, Gross CP. Trends in breast cancer screening costs among privately insured women aged 40 to 64 years. JAMA Intern Med. 2021;181(12):1665–8.
Robinson JC, Whaley CM, Brown TT. Price differences to insurers for infused cancer drugs in hospital outpatient departments and physician offices. Health Aff. 2021;40(9):1395–401.
Rodriguez S, Hug C, Todorov P, Moret N, Boswell SA, Evans K, Zhou G, et al. Machine learning identifies candidates for drug repurposing in Alzheimer’s disease. Nat Commun. 2021;12(1):1033.
Ross MK, Wei W, Ohno-Machado L. ‘Big Data’ and the Electronic Health Record. Yearb Med Inform. 2014;9(01):97–104.
Seabold, Skipper, and Josef Perktold. 2010. “Statsmodels: econometric and statistical modeling with Python.” In Proceedings of the 9th Python in Science Conference, 57:61. Austin, TX.
Tartof SY, Malden DE, Liu I-L, Sy LS, Lewin BJ, Williams JTB, Hambidge SJ, et al. Health care utilization in the 6 months following SARS-CoV-2 infection. JAMA Netw Open. 2022;5(8): e2225657.
Tricco Andrea C, Soobiah Charlene, Berliner Shirra, Ho Joanne M, Ng Carmen H, Ashoor Huda M, Chen Maggie H, Hemmelgarn Brenda, Straus Sharon E. Efficacy and safety of cognitive enhancers for patients with mild cognitive impairment: a systematic review and meta-analysis. CMAJ. 2013;185(16):1393–401.
Wei W-Q, Bastarache LA, Carroll RJ, Marlo JE, Osterman TJ, Gamazon ER, Cox NJ, Roden DM, Denny JC. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS ONE. 2017;12(7): e0175508.
Wineinger, Nathan E., Victoria Li, Jill Waalen, and Eric J. Topol. 2021. “Pre-existing health conditions and severe COVID-19 infection: analysis of commercial health insurance data from 690,000 infected patients.” bioRxiv. medRxiv. https://doi.org/10.1101/2021.03.11.21252708.
Wineinger NE, Zhang Y, Topol EJ. Trends in prices of popular brand-name prescription drugs in the United States. JAMA Netw Open. 2019;2(5): e194791.
Wu Y, Warner JL, Wang L, Jiang M, Jun Xu, Chen Q, Nian H, et al. Discovery of noncancer drug effects on survival in electronic health records of patients with cancer: a new paradigm for drug repurposing. JCO Clinical Cancer Informatics. 2019;3(May):1–9.
Zhang X, Lian S, Zhang Y, Zhao Q. Efficacy and safety of donepezil for mild cognitive impairment: a systematic review and meta-analysis. Clin Neurol Neurosurg. 2022;213(February): 107134.
Zhu W, Chernew ME, Sherry TB, Maestas N. Initial opioid prescriptions among U.S. commercially insured patients, 2012–2017. N Engl J Med. 2019;380(11):1043–52.
We would like to thank the Blue Cross Blue Shield Association Research Alliance team for their technical assistance and providing computational resources for this study.
Funding is provided by the NIH Grant R01 AG066750 from the National Institute on Aging, and by UL1 TR002550 from the National Center for Advancing Translational Sciences.
Ethics approval and consent to participate
The IRB protocol number for this project is IRB-19–7372.
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Diagram representation of the sorting process based on coverage history. Figure S2. A pictorial representation of the propensity matching process. Figure S3. Selection of segmented portions from non-users based on coverage during the claim count analysis.
Log 10 Hazard Ratio Distribution Graph from the survival analysis of partial coverage individuals. Supplementary Results Figure 2. Log 10 Hazard Ratio Cumulative Graph from the survival analysis of partial coverage individuals. Supplementary Results Figure 3. QQ plot for raw P values from the survival analysis of partial coverage individuals.
About this article
Cite this article
Hu, E., Li, T.S., Wineinger, N.E. et al. Association study between drug prescriptions and Alzheimer’s disease claims in a commercial insurance database. Alz Res Therapy 15, 118 (2023). https://doi.org/10.1186/s13195-023-01255-0