Study population
The UK Biobank is a population-based cohort of more than 500,000 participants aged 40–73 years at baseline between 2006 and 2010 [18]. These participants attended one of the 22 assessment centres throughout the UK [18]. The study design and population have been detailed elsewhere [18]. Individuals with dementia or cognitive impairment at baseline were excluded from the analysis. To enable the comparability of lifestyle factors and biomarkers between genders, those of non-European ancestry were excluded. In addition, individuals without medical record linkage were excluded from the analysis. Notably, age is the most important determinant of dementia, and only adjusting for age might not be able to fully control the confounding due to age when comparing the difference in dementia risk by sex. Meanwhile, age is also highly related to lifestyle factors and biomarkers such that matching by age may help reduce the bias when exploring the mediation effects of these factors. Therefore, one woman for each man was matched by age at baseline (±1 year) in the analysis. Our study adhered to the AGReMA guidelines [19].
Ascertainment of incident dementia
Dementia cases were ascertained using hospital inpatient records or death registers. Dementia was defined by a primary/secondary diagnosis using the international classification diseases coding system (detailed in Additional file 1: Table S1). Dementia was also defined as an underlying/contributory cause of death through linkage to death register data. Dementia diagnosed <65 years of age was categorized as young-onset dementia and that diagnosed ≥65 years was considered late-onset dementia [20]. The earliest recorded date was used as the onset date of dementia. Person-years were computed from the baseline assessment date to the date of onset dementia, date of death, or the end of follow-up (31 December 2020 for England and Wales and 31 January 2021 for Scotland), whichever came first.
Sociodemographic data
Data on age, ethnicity, education, and household income were collected using a self-reported questionnaire on a touchscreen tablet. Townsend index of material deprivation was used to assess the neighbourhood-level socioeconomic status.
Sex (female/male) was self-reported. “Sex” refers to biological differences such as levels of hormones, whereas “gender” refers to differences in the impact of psychosocial and socioeconomic factors on biological markers between genders [21]. The effect of both “sex” and “gender” was involved in the present study, and “sex” was used in the text.
Lifestyle factors
Self-reported data on lifestyle factors at baseline were collected via a touchscreen tablet. A short version of the International Physical Activity Questionnaire was used to estimate excess metabolic equivalent (MET)-hours/week of physical activity during work and leisure time. Intakes of food groups in the last year were self-reported using a structured questionnaire. A healthy diet score was calculated based on seven commonly eaten food groups (whole grains, refined grains, vegetable, fruit, fish, red meat, and processed meat) following recommendations on dietary priorities for cardiometabolic health [22]. A higher healthy diet score has been shown to be associated with a lower risk of dementia [23]. Sleep duration per day on average in the last 4 weeks was assessed with the survey item “About how many hours sleep do you get in every 24 h?” Alcohol consumption, as well as supplements including vitamins, folate, glucosamine, calcium, zinc, iron, and selenium per week in the last year, was self-reported.
Genetic data
Affymetrix using a bespoke BiLEVE Axiom array or the UK Biobank Axiom array was used for genotyping [24]. All genetic data were quality controlled and imputed by the UK Biobank team. APOE genotype was directly genotyped based on two single-nucleotide polymorphisms (rs7412 and rs429358). APOE4+ dominant model of E3/E4 or E4/E4 was used to define the presence of APOE4.
Blood tests
Blood samples were collected and analysed at a central laboratory at baseline between 2006 and 2010. Cholesterol was measured by direct enzymatic methods (Konelab, Thermo Fisher Scientific, Waltham, MA). Glycosylated haemoglobin (HbA1c) was measured using high-performance liquid chromatography. Serum cystatin C was measured by latex-enhanced immunoturbidimetric method on a Siemens ADVIA 1800 instrument. Serum 25-hydroxyvitamin D, a proxy for vitamin D levels, was measured using a chemiluminescent immunoassay (DiaSorin Liaison XL, DiaSorin Ltd., UK). Quality control was conducted by the UK Biobank central team (https://biobank.ctsu.ox.ac.uk/crystal/ukb/docs/serum_biochemistry.pdf).
Urinary biomarker data
Urine assays for sodium, potassium, microalbumin, and creatinine were measured by ion-selective electrode analysis on a Beckman Coulter AU5400 (https://biobank.ndph.ox.ac.uk/showcase/ukb/docs/urine_assay.pdf).
Health-related conditions
Chronic conditions at baseline were based on self-reported data or interviews. Participants were asked whether they had ever been told by a doctor that they had certain common medical conditions, such as cardiovascular disease, hypertension, diabetes, and depression. Additional disease cases at baseline were defined using inpatient data (initial diagnosis date before baseline interview date). Inpatient hospital data for the UK Biobank participants were available since 1997 [18]. Body mass index (BMI) was computed based on measured weight and height at baseline, and obesity was defined as BMI≥30 kg/m [2, 25]. A multimorbidity score was then computed based on 61 major diseases (Additional file 1: Table S2) [26].
Familial medical history
The medical history of the father, mother, and siblings was collected using a touchscreen device. Medical conditions included heart disease, stroke, hypertension, diabetes, cancer, dementia, Parkinson’s disease, and depression.
Environment measures
Air pollution and local environment measures were conducted by the Small Area Health Statistics Unit (http://www.sahsu.org/) and were linked centrally to the UK Biobank data (http://biobank.ctsu.ox.ac.uk/crystal/docs/EnviroExposEst.pdf). Particulate matter, nitrogen dioxide, and total nitrogen oxides were measured as annual average values in microgrammes per cubic metre. Road traffic measures were provided for the year 2008 from the Road Traffic Statistics Branch at the Department for Transport attached to the local road network; traffic data for unmonitored links were estimated based on surrounding monitored links. Data were also available regarding noise pollution, such as daytime, evening, and night-time average level of noise pollution (dB).
Statistical analysis
Baseline characteristics were expressed as frequency (percentage) and means±standard deviations (SDs). T-test for continuous variables and chi-square for categorical variables were used to test the difference of between sexes. Cox regression models were conducted to examine the sex effect on the incidence of all-cause, young-onset, and late-onset dementias adjusted for age.
The potential mediation effects of a wide range of individual factors on the association between sex and incident dementia were estimated using Cox proportional hazards regression models adjusted for age [27]. We used the following criteria to establish mediation [27]: (1) the mediator was significantly associated with sex; (2) sex was significantly associated with dementia; (3) the mediator was significantly associated with dementia; and (4) the association between sex and dementia was attenuated by the mediator (Additional file 1: Fig. S1). Potential mediators examined included socioeconomic factors (n=3), lifestyle factors (n=19), health-related conditions (n=3), familial history of medical conditions (n=24), genetics (n=1), blood biomarkers (n=49), urinary biomarkers (n=4), and pollution measures (n=30, Additional file 1: Table S3). We also examined the mediation effect of these groups of factors combined. Whether individual chronic conditions used to create multimorbidity risk score mediated the association between sex and dementia was also tested. The mediation analysis was conducted using macro programmes created by Spiegelman et al. [28] Benjamin-Hochberg’s procedure was used to control the false discovery rate at a 5% level for multiple comparisons [29].
A sensitivity analysis was conducted to examine whether the important determinants mediated the association between sex and incident dementia by excluding those cases diagnosed in the first 5 years of follow-up. A further sensitivity analysis was conducted to test mediation associations among individuals with complete data.
Multiple imputations for missing data were conducted, and all covariates were included in the imputation models to create 5 imputed datasets.
Data analyses were conducted using SAS 9.4 for Windows (SAS Institute Inc.) and all P values were two-sided with statistical significance set at <0.05.