Multi-platform proteomic analysis of Alzheimer’s disease cerebrospinal fluid and plasma reveals network biomarkers associated with proteostasis and the matrisome

Dammer, Eric B.; Ping, Lingyan; Duong, Duc M.; Modeste, Erica S.; Seyfried, Nicholas T.; Lah, James J.; Levey, Allan I.; Johnson, Erik C. B.

doi:10.1186/s13195-022-01113-5

Research
Open access
Published: 17 November 2022

Multi-platform proteomic analysis of Alzheimer’s disease cerebrospinal fluid and plasma reveals network biomarkers associated with proteostasis and the matrisome

Eric B. Dammer^1,2,
Lingyan Ping^1,2,3,
Duc M. Duong^1,2,
Erica S. Modeste^1,2,
Nicholas T. Seyfried^1,2,3,
James J. Lah^1,3,
Allan I. Levey^1,3 &
…
Erik C. B. Johnson^1,3

Alzheimer's Research & Therapy volume 14, Article number: 174 (2022) Cite this article

12k Accesses
38 Citations
31 Altmetric
Metrics details

Abstract

Robust and accessible biomarkers that can capture the heterogeneity of Alzheimer’s disease and its diverse pathological processes are urgently needed. Here, we undertook an investigation of Alzheimer’s disease cerebrospinal fluid (CSF) and plasma from the same subjects (n=18 control, n=18 AD) using three different proteomic platforms—SomaLogic SomaScan, Olink proximity extension assay, and tandem mass tag-based mass spectrometry—to assess which protein markers in these two biofluids may serve as reliable biomarkers of AD pathophysiology observed from unbiased brain proteomics studies. Median correlation of overlapping protein measurements across platforms in CSF (r~0.7) and plasma (r~0.6) was good, with more variability in plasma. The SomaScan technology provided the most measurements in plasma. Surprisingly, many proteins altered in AD CSF were found to be altered in the opposite direction in plasma, including important members of AD brain co-expression modules. An exception was SMOC1, a key member of the brain matrisome module associated with amyloid-β deposition in AD, which was found to be elevated in both CSF and plasma. Protein co-expression analysis on greater than 7000 protein measurements in CSF and 9500 protein measurements in plasma across all proteomic platforms revealed strong changes in modules related to autophagy, ubiquitination, and sugar metabolism in CSF, and endocytosis and the matrisome in plasma. Cross-platform and cross-biofluid proteomics represents a promising approach for AD biomarker development.

Introduction

Alzheimer’s disease (AD) is a growing public health problem with no available disease-modifying therapies. Multi-omic analyses of AD brain have illustrated the varied and complex pathophysiology beyond amyloid-β (Aβ) plaques and tau neurofibrillary tangles [1,2,3,4,5], but how these pathological processes develop over time during the disease course is unclear. Given that AD is a heterogeneous disease, composed of different combinations and degrees of brain pathologies in a given individual, multiple biomarkers beyond Aβ and tau will be required to advance our understanding of the complex disease processes underlying AD. One of the current limitations for advancement of AD research, clinical care, and therapeutic development is the lack of easily accessible biofluid biomarkers for these varied pathological processes.

Protein biomarkers represent a promising class of biomarkers for AD given the large diversity of potential markers, their direct role in subserving biological processes, and the fact that standard protein affinity-based measurement approaches are already deployed in most clinical laboratories around the world. Three main quantitative protein measurement technologies are currently available to conduct proteomic discovery experiments in biofluids at scale: mass spectrometry, multiplexed nucleic acid aptamers, and multiplexed antibodies. Mass spectrometry (MS) has been used most extensively to date in AD biofluid biomarker discovery research and provides a direct measurement of protein identity and abundance through the measurement of peptides [3, 6,7,8]. Through the use of isobaric tandem mass tags (TMT) or data-independent acquisition (DIA) techniques [9], cohorts of hundreds of subjects can be analyzed at a depth of thousands of proteins in CSF and plasma. Traditionally, depth of analysis by MS is limited by the large dynamic range of protein concentration present in CSF and, especially, in plasma, as well as the problem of missing values that accumulate when analyzing larger cohorts [10,11,12]. Specificity and accuracy may be affected by ion interference as the complexity of the matrix increases. More recently, two affinity-based proteomic technologies have become available for biofluid measurements that offer different advantages and disadvantages compared to MS for protein measurement in biofluids: the SomaScan® aptamer-based technology from SomaLogic, and proximity extension assay (PEA) technology from Olink®. SomaScan uses modified DNA aptamers (SOMAmers) with slow off-rate binding kinetics to measure relative protein levels in multiplex fashion [13,14,15]. Multiple SOMAmers can be generated towards a given protein target and included in the microarray-based readout of relative fluorescence intensity. PEA uses a sandwich antibody-based approach where the capture and detection antibodies are conjugated to a complementary oligonucleotide probe pair, the levels of which are ultimately measured by quantitative PCR or next-generation sequencing approaches to provide a relative protein abundance value [16, 17]. Measurement specificity is provided both through the dual epitope antibody binding as well as through specific oligonucleotide probe hybridization. As affinity-based approaches, both SomaScan and PEA (subsequently referred to as “Olink”) theoretically suffer less from dynamic range and missing value challenges compared to MS. However, because both are affinity-based, they are indirect measurements of protein identity and abundance. Specificity and accuracy may be affected by protein modifications or off-target binding, and sensitive and specific reagents must be designed for each protein. Comparison of SomaScan and Olink measurements for the same proteins in various bodily fluids have previously been described [18,19,20,21,22]. However, to our knowledge, no studies have compared these affinity-based measurements with mass spectrometry measurements, particularly across different biofluids.

In order to further the development of robust biofluid biomarkers of AD that can reflect multiple pathological processes in brain, we conducted a proteomic analysis on CSF and plasma by applying each proteomic technology described above to the same discovery set of AD and control samples. By analyzing the same samples with each proteomic technology, we were able to conduct an in-depth cross-platform technical analysis to better understand the current strengths and limitations inherent in each platform for AD biofluid biomarker development in CSF and plasma, and increase confidence in proteins that show promise as potential AD biomarkers. Furthermore, because the CSF and plasma samples we analyzed were matched within subject, we were able to explore the relationship between CSF and plasma levels of promising AD biomarkers within subject. We leveraged the combined proteomic datasets to generate an AD protein co-expression network in CSF and plasma and explored how protein co-expression in these two fluids might be related to each other and to AD brain protein co-expression. We found strong co-expression signals for proteostasis, synaptic biology, sugar metabolism, complement, and TGF-β signaling in AD CSF, and endocytosis, matrisome, and complement in AD plasma. Multi-platform proteomic analysis of AD biofluids holds promise for identification of robust biomarkers for clinical translation.

Results

Pre-processing and technical analyses of proteomic measurements in CSF and plasma

For our discovery experiments and cross-platform proteomic analyses, we used CSF and plasma from a cohort of control (n=18) and AD (n=18) patients in the Emory Goizueta Alzheimer’s Disease Research Center (ADRC) (Fig. 1, Additional file 2: Supplementary Table 1). The CSF and plasma samples were drawn at or near the same time point for each subject. All samples were analyzed by each proteomic platform except for one subject (n=35/36), whose CSF and plasma were analyzed only by SomaScan. This subject was excluded from all direct cross-platform comparisons, and therefore, such analyses were restricted to N=35 subjects. For both CSF and plasma, mass spectrometry measurements were performed using isobaric tandem mass tags (TMT-MS) with pre-fractionation [7], including both with and without prior depletion of highly abundant proteins in each fluid. For PEA analyses, we used all thirteen qPCR-based human biomarker panels available through Olink, encompassing 1196 protein assays (1160 unique proteins). For aptamer-based analyses, we used the SomaScan assay (v4.1) from SomaLogic, which provides 7288 SOMAmers targeting 6596 unique proteins.

Analysis of the signal-to-noise (S:N) properties of each platform showed that a large proportion of the SomaScan measurements in CSF were at or near background noise level (Additional file 1: Supplementary Figure 1A). By contrast, S:N was acceptable for nearly all SOMAmers in plasma (Additional file 1: Supplementary Figure 1B). To address this limitation, we empirically determined a S:N cutoff for SOMAmers in CSF by correlating measurements in common across all three proteomic platforms at different SOMAmer S:N thresholds, and selected a S:N threshold where the correlations were maximized (Additional file 1: Supplementary Figure 1C). This S:N threshold was 0.45. Applying this threshold to the SomaScan data reduced the number of quantified SOMAmers in CSF from 6776 to 3624 (Additional file 1: Supplementary Figure 1D, Additional file 2: Supplementary Table 2). This reduced set of SomaScan CSF measurements was used for most subsequent analyses.

We also analyzed missing measurements present in each platform across 36 CSF and plasma samples (Additional file 1: Supplementary Figure 2). The Olink and SomaScan platforms had a similar increase in missing values across samples, which was greater in CSF than in plasma. TMT-MS suffered more from missing values in both fluids, particularly in plasma. In plasma undepleted of highly abundant plasma proteins, a maximum of approximately 500 quantified proteins was reached within 2 batches of TMT-MS. In plasma depleted of the top fourteen most abundant proteins, this threshold had nearly been reached at the point when all 36 samples had been analyzed. We decided to set a threshold of <75% missing values (or measurement in at least 9 out of 36 samples) for subsequent individual protein analyses and exclude proteins with higher levels of missing values from consideration. Protein measurements that met this threshold were well balanced across AD and control cases. After applying the S:N and missingness filters, a large proportion (50.6%) of the SomaScan measurements in CSF, and a significant proportion (13.5%) of the TMT-MS measurements in depleted plasma, were removed from consideration for individual protein analyses.

In order to assess whether highly abundant protein depletion significantly affected TMT-MS measurements in CSF and plasma, we correlated protein values before and after depletion of these top fourteen most abundant proteins within subject (Additional file 1: Supplementary Figure 3A). Median correlation after depletion in CSF was excellent in both CSF (r=0.78) and plasma (r=0.69), with greater variability introduced by depletion in plasma. Correlation was also good at the group level when proteins that were significantly altered in AD in either depleted or undepleted CSF (Additional file 1: Supplementary Figure 3B) or plasma (Additional file 1: Supplementary Figure 3C) were correlated with depletion versus no depletion. To avoid setting an arbitrary correlation threshold, we excluded all proteins measured by TMT-MS in CSF and plasma that had correlation values of zero or below (i.e., anticorrelated) or that had an opposite direction of change in AD, after highly abundant protein depletion. This totaled 32 proteins in CSF, and 27 proteins in plasma (Additional file 2: Supplementary Table 3). The final protein abundance matrices used for individual protein analyses and cross-platform comparisons therefore included the <75% missingness filter across all platforms and fluids, the S:N filter for SomaScan CSF, and excluded proteins that were strongly affected by highly abundant protein depletion in CSF and plasma from TMT-MS measurements.

Cross-platform comparisons

Across all three platforms, we were able to measure a total of 4655 unique proteins (as represented by unique gene symbols) in CSF, and 6794 unique proteins in plasma (Fig. 2A, Additional file 2: Supplementary Tables 4-7). The SomaScan platform provided the deepest proteomic coverage in plasma, measuring 4662 proteins not measured by Olink or MS. Most of the proteins that could be measured in plasma could also be measured in CSF by Olink (Fig. 2B, Additional file 2: Supplementary Table 8), whereas due primarily to our S:N filter, only approximately half of the proteins that could be measured in plasma by SomaScan could also be reliably measured in CSF (Additional file 2: Supplementary Table 9). Over twice as many proteins could be measured in CSF compared to plasma on depleted fluid using TMT-MS due to the larger number of highly abundant proteins in plasma and the protein dynamic range limitations of unbiased discovery MS-based approaches. Only marginal improvement in proteomic coverage was observed in MS with depleted versus undepleted fluid in both CSF and plasma (Additional file 1: Supplementary Figure 4A, Additional file 2: Supplementary Tables 10 and 11), with depth of coverage improvement more apparent in CSF than in plasma (Additional file 1: Supplementary Figure 4B). Ontologies of proteins uniquely measured by the SomaScan platform in CSF and plasma included nucleic acid metabolism and binding, and nucleus (Additional file 1: Supplementary Figure 4C), suggesting that the platform was enriched for measurement of nuclear proteins. Ontologies unique to Olink included mitotic cell cycle, immune processes, and plasma membrane, reflecting selection bias of these biological pathways in the Olink platform compared to the other platforms. Ontologies unique to MS included transmembrane transport, complement, and cytoskeleton/structural proteins, representing more highly abundant proteins.

We correlated protein measurements across all three platforms in CSF and plasma (Fig. 3, Additional file 2: Supplementary Tables 12-17, Additional file 1: Extended Data). Median correlation within subject was approximately 0.7 for CSF, with a similar distribution of correlation values among platforms. Median correlation was slightly lower in plasma at approximately 0.6, with more variability between the MS and affinity-based measurements than between SomaScan and Olink affinity-based measurements. Other than slightly lower overall correlation, these correlation patterns were generally similar when analyzed at the group level rather than within subject (Additional file 1: Supplementary Figure 5). Improvement in median correlation was generally observed in plasma when only proteins that were significantly altered in AD in a given platform were used for correlation (Fig. 3B), suggesting that inclusion of proteins with lower S:N led to decreased correlation. Interestingly, in CSF, the improvement in correlation was only observed with MS-based measurements. In summary, median correlation of proteomic measurements between platforms within the same subject was quite good, with better correlation in CSF than in plasma.

To compare how proteomic measurements in our discovery cohort compared to Olink and SomaScan measurements in other AD cohorts, we performed correlation analyses with plasma Olink data from a Hong Kong-based cohort [23], CSF and plasma Olink data from the BioFinder cohort [24], and SomaScan plasma data from the AddNeuroMed cohort [25] (Additional file 1: Supplementary Figure 6). Correlations were restricted to proteins significantly altered in AD in each biofluid to maximize S:N. In AD plasma, correlation of Olink measurements in the Hong Kong cohort with our discovery cohort Olink measurements was excellent (r=0.82), with lower but strong correlation with SomaScan (r=0.57) and MS (r=0.63) measurements in our cohort (Additional file 1: Supplementary Figure 6A). When comparing BioFinder Olink CSF measurements with our CSF measurements, correlation was also excellent across all measurement platforms (r~0.7) (Additional file 1: Supplementary Figure 6B). However, BioFinder Olink plasma measurements did not correlate with our discovery cohort platform plasma measurements (Additional file 1: Supplementary Figure 6C). SomaScan plasma measurements in the AddNeuroMed cohort also did not correlate with any of our platform plasma measurements (Additional file 1: Supplementary Figure 6D). These findings suggest that our cohort was most similar to the Hong Kong cohort and that pre-analytical factors unique to each cohort likely significantly influenced the plasma measurements in each cohort.

Proteins of lower abundance are decreased in AD plasma

To determine which proteins were significantly altered across platforms in AD CSF and plasma, we performed differential abundance analyses within each fluid for each proteomic platform (Fig. 4, Additional file 1: Supplementary Figure 7). The analyses were performed without median normalization of overall protein abundance levels between AD and control cases, given that biomarker measurements in a clinical setting do not undergo median normalization [26, 27]. While a greater number of proteins were found to be decreased in AD CSF across all platforms, the decrease in plasma proteins in AD was much greater than in CSF and was strikingly apparent across all platforms (Fig. 4A). This finding was consistent with the strong bias towards lower protein abundance observed in AD plasma in the Hong Kong cohort, in which the data had undergone some degree of median normalization prior to differential abundance analysis [23]. Overlap of differentially abundant proteins was low to modest across platforms (Fig. 4B), due likely in part to the smaller size of the cohort and less statistical power to observe significant differences. Given the clear abundance differences observed in AD plasma across platforms, we further investigated this phenomenon by exploring which proteins were driving the difference in abundance. We first ranked each platform measurement by its contribution to the overall signal within each fluid and tested the difference between the top 5% strongest signals in each platform compared with the total signal (Additional file 1: Supplementary Figure 8). By this approach, we found that both the top 5% and overall protein levels were decreased in AD plasma by SomaScan and Olink, but not MS. However, because SOMAmer relative fluorescence units and Olink normalized protein expression values do not necessarily correlate with absolute protein abundance, we also calibrated measurements in each platform to known absolute protein concentrations in plasma obtained from the Human Protein Atlas [28]. After calibration, we observed that the top 5% most highly abundant proteins were not decreased in AD plasma in any platform, but that the overall decrease in AD plasma proteins was driven by proteins of lower abundance. This was consistently observed across platforms except for undepleted plasma analyzed by MS, in which many fewer proteins of lower abundance were measured. In summary, we observed a decrease in CSF and plasma proteins in AD compared to control, with the striking bias in plasma driven by proteins of lower abundance.

Brain protein network module coverage by platform in CSF and plasma

We recently generated a consensus AD brain protein co-expression network from over 500 brain tissues as part of the Accelerating Medicines Partnership for Alzheimer’s Disease (AMP-AD) initiative that revealed many modules strongly correlated to AD neuropathological traits and cognitive decline [2] (Fig. 5A). To determine the potential for AD-relevant brain modules to be measured by markers in CSF and plasma, we calculated the percent coverage in CSF and plasma for each of the 44 brain network modules by proteomic platform (Fig. 5B). All modules had at least some coverage in CSF, with SomaScan and MS providing the most module coverage in CSF compared to the Olink 1196 platform. In plasma, SomaScan provided the most module coverage. Brain module M26 complement/acute phase was particularly well covered by SomaScan and MS in both CSF and plasma, and M42 matrisome was well covered in both fluids by all platforms. Given that M42 matrisome had the strongest correlation to AD neuropathological traits in brain, we more closely examined this module across platforms in both fluids. M42 hub proteins—or proteins that contribute most to the module eigenprotein and are drivers of module co-expression—were generally well measured by all platforms in CSF (Fig. 6A). In plasma, M42 hub protein coverage was best with SomaScan and Olink, and especially with SomaScan. SMOC1, which was the strongest driver of M42 co-expression, could be measured in CSF and plasma by both Olink and SomaScan, and measurements were well correlated in both fluids between the two platforms (Additional file 1: Supplementary Figure 9A, B). Levels of SMOC1 were elevated in both AD CSF and plasma despite the decrease in lower abundant proteins in AD plasma (Fig. 6B). We leveraged Olink CSF and plasma data from control (n=90) and Parkinson’s disease (PD, n=118) subjects in the Accelerating Medicines Partnership for Parkinson’s Disease (AMP-PD) consortium to test the specificity of SMOC1 for AD (Fig. 1B). We did not observe an increase in SMOC1 in PD CSF and observed a weak increase in PD plasma (Fig. 6C). We also tested specificity of SMOC1 for AD by measuring SMOC1 CSF levels using TMT-MS in a separate cohort comprised of control, AD, ALS, FTD, and PD subjects as previously described in Higginbotham et al. [7] Elevation of SMOC1 in CSF was specific to AD. SMOC1 levels were generally well correlated between CSF and plasma within AD subjects but not controls (Fig. 6D). CSF and plasma SMOC1 levels also correlated with Aβ/Tau levels in CSF (Fig. 6E). In plasma, this correlation was driven by group differences and was not significant within group (Additional file 1: Supplementary Figure 9C). SMOC1 levels did not correlate strongly with cognitive function in AD (Additional file 1: Supplementary Figure 9D) or PD (Additional file 1: Supplementary Figure 9E). Interestingly, SMOC1 levels correlated weakly with age in both CSF and plasma (Fig. 6F), an association which has been previously described in plasma [29, 30].

Hub proteins of other AD-relevant brain co-expression modules could also be measured in CSF and plasma (Fig. 7). These included HOMER1 in the M5 Post-Synaptic Density module (Additional file 1: Supplementary Figure 10), NEFL in the M3 Oligodendrocyte/Myelination module (Additional file 1: Supplementary Figure 11), CHI3L1—also known as YKL-40—in the M21 MHC Complex/Immune module (Additional file 1: Supplementary Figure 12), YWHAZ in the M4 Synapse/Neuron module (Additional file 1: Supplementary Figure 13), ENO1 in the M7 MAPK Signaling/Metabolism Module (Additional file 1: Supplementary Figure 14), and PEBP1 in the M25 Sugar Metabolism Module (Additional file 1: Supplementary Figure 15). All of these proteins were increased in AD CSF, yet only NEFL and CHI3L1 were also increased in AD plasma, demonstrating the diversity of potential AD biomarker changes across different fluids. Furthermore, not all proteins within a brain module were found to behave similarly in CSF and plasma. For instance, YWHAZ in the M4 Synapse/Neuron module was observed to be increased in CSF and decreased in plasma, whereas NPTXR, another M4 protein, was found to be decreased in both CSF and plasma (Additional file 1: Supplementary Figure 16). NPTXR also illustrated a discrepancy in platform measurements, where the Olink measurement was significantly negatively correlated with the MS and SomaScan measurements in CSF, but was more similar to the SomaScan measurement than the MS measurement in plasma (Additional file 1: Supplementary Figure 16A). Another example of a protein with discrepancy in measurements across fluids was SPP1 in the M21 MHC Complex/Immune module (Additional file 1: Supplementary Figure 17). SPP1 measurements best correlated between MS and SomaScan in CSF, with the Olink measurement being anticorrelated to the other two platforms. However, in plasma, SomaScan and Olink SPP1 measurements correlated well, whereas MS did not with either affinity-based platform. NEFL was also highly correlated among all platforms in CSF, but was anticorrelated between SomaScan and Olink in plasma. As has been previously demonstrated, NEFL levels were strongly correlated with increasing age [31] (Additional file 1: Supplementary Figure 11F). In summary, AD brain co-expression module proteins could be measured by all three platforms in CSF and plasma, but SomaScan had the best coverage in plasma, especially for the M42 matrisome module. SMOC1, a hub of M42, was elevated in both AD CSF and plasma and levels correlated within subject between CSF and plasma. Other AD brain module protein hubs could also be measured in CSF and plasma, but opposite directions of change were often observed between CSF and plasma protein levels of these hubs, and protein measurements did not always positively correlate across proteomic platforms.

AD CSF co-expression network reveals strong disease-related modules reflecting proteostasis, synaptic, complement, and sugar metabolism pathophysiology

While AD-related protein co-expression modules have been reliably identified in brain, to date it has been unclear whether co-expression modules related to AD are also present in biofluids. To address this question, we leveraged all three proteomic platforms and harmonized their measurements by median normalization into separate protein abundance matrices for CSF and plasma, and used these harmonized abundance matrices to build co-expression networks for each fluid. We were then able to compare these networks to one another, and to the consensus AD brain network (Fig. 1C). Using this approach, we built a CSF co-expression network from 7158 protein assays targeting 4154 unique gene symbols (Fig. 8A, Additional file 2: Supplementary Tables 18 and 19, Additional file 1: Extended Data). The network consisted of 38 modules, with each platform contributing measurements to nearly all modules (Additional file 1: Supplementary Figure 18). Modules that were most strongly correlated to Aβ and tau pathological measures in CSF and/or cognitive function included M15 post-synaptic membrane, M8 autophagy, M7 SNAP receptor/SNARE complex, M32 synaptic membrane/matrisome, M16 sugar metabolism, M29 sugar metabolism/actin depolymerization, M24 ubiquitination, M26 TGF-β signaling, and M3 complement/protein activation cascade. M8 autophagy and M24 ubiquitination modules were particularly strongly correlated to total tau and p-tau181 levels. The M8 autophagy module contained microtubule associated protein tau (MAPT) and SMOC1 as members, as well as other markers previously associated with AD such as NEFL and PEBP1 (Fig. 8B). M8 module eigenprotein levels were strongly negatively correlated with cognitive function (r= –0.67) and Aβ42/tau ratio (r= –0.82), and strongly positively correlated with total tau (r=0.86) and p-tau181 (r=0.78) (Fig. 8C), reflecting its close association to AD brain amyloid-β and tau pathology. Among the other modules that correlated strongly with traits were M29 sugar metabolism/actin depolymerization, which was most strongly correlated to APOE ε4, and M26 TGF-β signaling, which was strongly positively correlated with age. The M8 autophagy and M15/M32 synaptic modules were enriched in neuronal and oligodendrocyte cell type markers, potentially reflecting the brain cell type origin of these CSF modules. The M24 ubiquitination and M29 sugar metabolism/actin depolymerization modules did not have cell type character, whereas the M3 complement/protein activation cascade module was enriched in endothelial and microglial markers.

We tested whether CSF modules were present in brain or plasma by two different approaches: over-representation analysis (ORA), and network preservation statistics. We also tested how the module eigenproteins changed in CSF between AD and control, and whether the cognate module eigenprotein (or “synthetic” eigenprotein) in plasma and brain were altered in AD (Fig. 8A, Additional file 1: Extended Data). The CSF M3 complement/protein activation cascade module was most strongly preserved in plasma and brain and was decreased in AD CSF. M26 TGF-β signaling was also decreased in AD CSF. Modules that were increased in AD CSF included the M15 and M32 synaptic, and M8 and M24 proteostasis modules, along with M16 sugar metabolism. Most of the synthetic eigenproteins for these modules were decreased in plasma except for the M3 complement/protein activation cascade module, which was increased in both plasma and brain.

In summary, we were able to construct an AD CSF protein co-expression network from >7000 protein assays in CSF which revealed strong disease-associated modules related to proteostasis, synaptic biology, sugar metabolism, and complement. All module eigenproteins were increased in CSF and decreased in plasma except for complement, which was decreased in CSF and increased in plasma.

AD plasma co-expression network reveals strong disease-related modules reflecting endocytosis and matrisome pathophysiology

The plasma network included 9589 protein assays targeting 6614 unique gene symbols, and consisted of 35 modules with good platform measurement representation across the network, similar to the CSF network (Fig. 9A, Additional file 2: Supplementary Tables 20 and 21, Additional file 1: Supplementary Figure 18). The SomaScan platform contributed approximately 80% of the measurements in the network. A striking feature of the plasma network was the number of modules related to extracellular matrix biology and the matrisome that correlated with AD CSF Aβ and tau biomarkers. One such module was the M33 adhesion/ECM/wound response module whose eigenprotein—along with other matrisome-related module eigenproteins—was elevated in AD plasma despite the decrease in lower abundance plasma proteins in AD (Fig. 9B, C). M33 module co-expression was driven by tenascin (TNC), an extracellular matrix protein involved in neuronal migration and regeneration, as well as synaptic plasticity. TNC was measured by ten separate assays across the three platforms, eight of which were SOMAmers. Eight of the ten TNC assays fell within M33, including six SOMAmers and one Olink and one MS measurement, suggesting good correlation for most TNC measurements across platforms. SPP1 as measured by Olink and SomaScan were also members of M33. Another module strongly related to AD was the M24 endocytosis module, which was the module most strongly correlated to total tau levels in CSF. Interestingly, this module was not strongly preserved in CSF or brain, potentially reflecting a more systemic process associated with AD. Plasma modules were generally less well preserved in CSF and brain than CSF modules were preserved in plasma and brain. Brain protein co-expression was generally not strongly preserved in CSF or plasma except for the complement module, which was highly preserved across all tissues (Additional file 1: Supplementary Figure 19 and 20, Additional file 2: Supplementary Table 22).

We compared our 3-platform plasma network using module ORA to a serum network built from approximately 5000 SOMAmers previously reported by Emilsson et al. [32] (Additional file 1: Supplementary Figure 21). The serum network had fewer modules than the plasma network (27 versus 38). Over half of the serum modules had significant overlap in plasma by ORA. One of these modules was serum module 11—a lipid module with many module protein levels affected by variation in the APOE locus. Serum module 11 overlapped with plasma modules M15 Lipid Biosynthesis/Immune Response and M32 Lipoprotein Metabolism. M15 co-expression was driven by ApoE and levels of this module decreased most strongly with increasing number of APOE ε4 alleles, whereas M32 co-expression was strongly driven by ApoB and module levels increased most strongly with the number of APOE ε4 alleles (Fig. 9A). Therefore, the 3-platform plasma network provided sufficient resolution to identify lipoprotein-related protein co-expression modules divergent in their relationship to APOE ε4 genotype.

Given our findings with the discrepancy in levels of individual AD-related proteins between CSF and plasma, we tested whether CSF and plasma co-expression modules also showed discrepancy in the levels of their eigenproteins and synthetic eigenproteins in the paired fluid (Fig. 10). Like many individual proteins, we also observed an inverse relationship of module levels in plasma compared to the levels in CSF in AD. One notable exception was the M3 complement/protein activation cascade module, where the within-subject module eigenprotein was increased in AD plasma compared to AD CSF. In plasma, the within-subject eigenprotein relationship to CSF was noisier and did not show a strong discrepancy between fluids for most modules. An exception again was the plasma M8 protein activation cascade module, which was increased in AD plasma compared to CSF for most subjects despite the general decrease in protein abundances in AD plasma.

In summary, we were able to build AD CSF and plasma protein co-expression networks using measurements from all three proteomic platforms, providing excellent proteomic depth in each fluid. Synaptic, proteostasis, sugar metabolism, and complement modules were strongly altered in AD CSF, while matrisome, endocytosis, and complement modules were altered in AD plasma. The complement modules were the modules best preserved across brain, CSF, and plasma and were observed to be increased in brain and plasma, but decreased in CSF. AD-related CSF module eigenproteins were generally increased in AD but decreased in plasma, likely due in part to the overall decrease of low abundance plasma proteins in AD. The relationship of plasma modules to CSF was more variable, reflecting the contribution of many tissues other than brain to plasma protein co-expression.

Discussion

In this study, we used three different proteomic technologies to interrogate matched CSF and plasma from a discovery cohort of AD subjects in order to identify and assess promising protein biomarkers for AD. We found that overall correlation among the proteomic platforms was good, with weaker correlation in plasma. We observed a general decreased expression level of lower abundance proteins in AD CSF and plasma, most notably in plasma. However, despite this general decrease, proteins that are important drivers of AD brain co-expression modules such as SMOC1 were increased in AD plasma and show promise as accessible biomarkers of AD brain pathology. Co-expression analysis showed strong changes in AD CSF related to proteostasis, sugar metabolism, synaptic biology, and complement pathways. Analysis of plasma showed matrisome, complement, and endocytosis modules strongly correlated with AD. These modules may themselves represent promising AD biofluid biomarkers potentially more robust to analytical and natural biological variation than individual protein markers.

SOMAmer signals were much lower in CSF compared to plasma, an important consideration when interpreting SomaScan assay data in this fluid. We used an empirically derived threshold to remove assays that did not meet our criteria for acceptable S:N, which removed approximately half of the assays from consideration for most of our analyses. The approach to handle S:N for SomaScan in CSF will depend on the context and needs of the desired analysis. There were no issues with S:N for SomaScan in plasma, reflecting optimization of the platform for this matrix.

Correlation of protein measurements across platforms was good, with a median r of approximately 0.7 in CSF and 0.6 in plasma. The median correlation in plasma was generally higher than what has previously been observed comparing Olink and SomaScan assays [18, 19], perhaps because of the larger number of assays compared between these two platforms in this study and the lack of normalization of the SomaScan data [18]. The correlations improved when considering only proteins that are altered in AD, suggesting that measurement variability within a given assay reduced the correlation when S:N for a given assay was low. The source of this noise is likely biological, particularly in plasma where protein levels are influenced by multiple tissues and organ systems and other factors unrelated to AD. An important consideration when interpreting cross-platform correlation is also protein isoform, or “proteoform,” complexity in plasma versus CSF, including splicing variation and post-translational modifications (PTMs), as well as protein complex formation. Such proteoform complexity is very likely to be a large driver of variation across proteomic measurements within and across platforms, where a given platform is targeting a particular epitope or peptide for measurement of protein levels that could be obscured by splice variation, PTMs, or complex formation [18, 33]. This is likely the source of variation in SPP1 measurement noted above, which has greater than 40 known phosphorylation sites and 5 glycosylation sites. Another source of variation is also off-target binding, or ion co-isolation and interference in the case of MS, which is a larger issue in more complex matrices such as plasma. In this context, one current advantage for the SomaScan and MS platforms is the use of multiple aptamers or peptides for the measurement of some proteins, which could allow for consideration of such biological and technical variation. An example of this is the SomaScan measurement of TNC described above, where six of the eight SOMAmers correlated with one another. Multiple assays for other AD-related proteins, such as NEFL, CHI3L1, NPTXR, and SPP1 would be a welcome addition to proteomic platforms. In-depth characterization of the actual protein species being measured by an assay in a given platform will significantly advance our understanding of proteoforms related to disease.

A key observation that arose from our analyses was the strong bias for proteins of medium to low abundance to be decreased in AD plasma. Our AD plasma data were similar to those in a recently described Hong Kong cohort analyzed by Olink where this strong bias was also observed [23]. The same bias has also been observed in a TMT-MS study [34]. The basis of this observation is not currently clear, but could represent general protein translation reduction [35] in a systemic fashion in AD that somewhat spares proteins of higher abundance such as albumin, which are the primary drivers of gross plasma protein level measurements in clinical assays. Alteration of the blood-brain barrier in AD may also contribute to the observed discrepancy between the CSF and plasma levels of some proteins [8, 36]. We chose not to perform median normalization prior to analysis given that we are most interested in actual measured levels for clinical biomarker discovery and translation. Therefore, proteins that remain elevated in plasma without median normalization, such as SMOC1, represent highly promising markers for AD. SMOC1 has been shown in prior studies to be significantly altered in both brain and CSF [7, 34, 37]. SMOC1 may be an excellent marker for the M42 brain matrisome module in CSF and plasma given the fact that it is a key driver of M42 co-expression in brain. M42 is strongly related to amyloid deposition [2, 38], and the fact that CSF and plasma SMOC1 did not correlate with cognitive function is consistent with this association given that amyloid also does not strongly correlate with cognitive function [39] and M42 levels are not an independent driver of cognitive decline after adjustment for AD neuropathology [2]. Other brain module hub proteins as described above may also represent promising biofluid biomarkers for brain processes. The relationship in the levels of many of these markers between CSF and plasma appears more complicated than for SMOC1, with some showing opposite directions of change in AD plasma compared to CSF, consistent with prior observations in a TMT-MS study in which a number of AD markers that were observed to be increased in CSF were decreased in serum [34]. While some of this variation may be due to peripheral sources of the protein, it is also possible that exchange of certain proteins from CSF to the plasma compartment is regulated. Further investigation into such possible regulatory mechanisms is warranted.

One potential way to deal with variation in any one biomarker is to construct panels of markers that reflect a particular biological process, where the composite level of the panel becomes the measurement of the biological process of interest [7]. In this study, we constructed protein co-expression networks in CSF and plasma to illustrate this potential approach to AD biomarker development. We were able to incorporate all three proteomic platform measurements into these networks, which were based on >7100 protein measurements in CSF, and >9500 protein measurements in plasma. To our knowledge, these are the deepest proteomic analyses of these two fluids to date. The CSF network revealed autophagy and ubiquitination modules that were strongly correlated to current AD CSF biomarkers, indicating disruption of proteostasis is a strong disease-related signal in AD CSF. Other strong disease-related signals included alterations in synaptic biology, sugar metabolism, and complement, all of which have been previously described in AD brain [2, 3]. In plasma, these modules were not as well defined. Whether the CSF eigenproteins for these modules will translate into potential plasma biomarkers is currently unknown and will require further study in larger cohorts. One module that will likely translate across tissues is the complement module, which was highly preserved across brain, CSF, and plasma. Complement proteins have previously been shown to be increased in both AD plasma and brain by TMT-MS. [40] Interestingly, while complement module levels were increased in brain and plasma, they were decreased in CSF, suggesting that complement deposition in brain leads to discordance in brain-CSF levels similar to that seen with brain Aβ deposition [41]. However, in contrast to Aβ, the source of complement in the brain may be derived largely from peripheral sources given the observed elevation in plasma. This hypothesis would be consistent with recent findings on peripheral factors that influence brain pathophysiology in AD [42].

Our study was conducted on a small discovery cohort of subjects. At n=36, we were powered to detect a correlation rho of 0.45 at 80% power and p=0.05. For analyses of differential abundance between AD and control groups considering all proteins measured, we were powered to detect a fold change of ~1.2 to ~1.3. For correlation network analyses, it is generally advisable to include at least 20 total samples to avoid spurious correlations according to Oldham et al. [43] Here, we used 35 independent samples for both CSF and plasma networks. We were therefore sufficiently powered to draw conclusions about differential abundance, cross-platform correlations, and correlation networks. Additional studies in larger cohorts will be required to further validate these findings.

In summary, multi-platform proteomic analysis of AD CSF and plasma is a promising approach to further development of biomarkers that can reflect the complex and multifaceted processes that comprise AD and that can enable patient stratification, diagnostics and disease monitoring, and therapeutic development.

Methods

CSF and plasma samples and case classification

All CSF and plasma samples used in this study were collected under the auspices of the Emory Goizueta Alzheimer’s Disease Research Center (ADRC). The cohort consisted of 18 healthy controls and 18 patients with AD. Basic demographic data were obtained from the Goizueta ADRC. Controls and patients with AD received standardized cognitive assessments in the Emory Cognitive Neurology Clinic or Goizueta ADRC. CSF and plasma were collected at or near the same time point in each individual and banked according to the 2014 National Institute on Aging best practice guidelines for Alzheimer’s Disease Centers (https://alz.washington.edu/BiospecimenTaskForce.html). CSF samples were subjected to ELISA Aβ_1–42, total tau, and p-tau181 analysis by the INNO-BIA AlzBio3 Luminex Assay [44]. ELISA values were used to support diagnostic classification based on established AD biomarker cutoff criteria [45, 46]. APOE genotype was determined by extracting DNA from the plasma buffy using the GenePure kit (Qiagen) following the manufacturer’s recommended protocol, then determining the rs7412 and rs429358 genotypes using either an Affymetrix Precision Medicine Array (Affymetrix) or TaqMan assays (Thermo Fisher Scientific C_904973_10 and C_3084793_20). All samples were analyzed by each proteomic platform except for one subject, whose CSF and plasma were analyzed only by SomaScan. All Emory research participants provided informed consent under protocols approved by the Institutional Review Board at Emory University. Summarized case metadata is provided in Additional file 2: Supplementary Table 1.

Quantification of proteins by Olink proximity extension assay (PEA)

Proteins were quantified by PEA as previously described [17]. Aliquots of CSF and plasma from each subject were sent to Olink (Olink Proteomics, Uppsala, Sweden) for analysis on 13 human Olink Target 96 panels (cardiometabolic, cardiovascular II, cardiovascular III, cell regulation, development, immune response, inflammation, metabolism, neuro exploratory, neurology, oncology II, oncology III, and organ damage). All samples passed quality control measures and were randomized by Olink prior to analysis on single plates. Results were reported as Normalized Protein eXpression (NPX) values in log2 scale for relative quantification of protein abundance.

Quantification of proteins by SomaLogic SomaScan modified aptamers

Proteins were quantified by SomaScan as previously described [13, 14]. Aliquots of CSF and plasma from each subject were sent to SomaLogic (SomaLogic, Boulder, CO) for analysis using the modified aptamer SomaScan assay (v4.1). All samples passed quality control measures and were randomized by SomaLogic prior to analysis on single plates. Results were reported as relative fluorescence units (RFUs) for relative quantification of protein abundance.

CSF protein preparation and digestion for tandem mass tag mass spectrometry (TMT-MS) analysis

CSF undepleted of highly abundant plasma proteins

Equal volumes (50 μl of each sample) of CSF were digested with lysyl endopeptidase (LysC, Wako 125-05061) and trypsin (Thermo Fisher Scientific 90058). Briefly, each sample was reduced and alkylated with 1 μl of 0.5 M tris-2(-carboxyethyl)-phosphine (TCEP) and 5 μl of 0.4 M chloroacetamide (CAA) at 90°C for 10 min, followed by water bath sonication for 15 min. The same volume of 8 M urea buffer [56 μl, 8 M urea in 10 mM Tris, 100 mM NaH₂PO₄ (pH 8.5)] was added to each sample after cooling the samples to room temperature, along with LysC (2.5 μg). After overnight digestion, 336 μl of 50 mM ammonium bicarbonate (ABC) was added to each sample to dilute the urea concentration to 1 M, along with trypsin (5 μg). After 12 h, the trypsin digestion was stopped by adding final concentration of 1% formic acid (FA) and 0.1% trifluoroacetic acid (TFA).

CSF depleted of highly abundant plasma proteins

To increase the depth of proteome coverage, immunodepletion of highly abundant proteins was performed as previously described [7]. For CSF samples, 130 μl was incubated with equal volume (130 μl) of High Select Top14 Abundant Protein Depletion Resin (Thermo Fisher Scientific, A36372) at room temperature in centrifuge columns (Thermo Fisher Scientific, A89868). After 15 min of mixing with gentle rotation, the samples were centrifuged at 1000×g for 2 min. Sample flow-through was concentrated with a 3K Ultra Centrifugal Filter Device (Millipore, UFC500396) by centrifugation at 14,000×g for 30 min, and then the immunodepleted samples were diluted to equal volumes of 75 μl with phosphate-buffered saline. Immunodepleted CSF (60 μl) was then digested with LysC and trypsin. Briefly, the samples were reduced and alkylated with 1.2 μl of 0.5 M TCEP and 3 μl of 0.8 M CAA at 90°C for 10 min, followed by water bath sonication for 15 min. Samples were diluted with 193 μl of 8 M urea buffer [8 M urea in 10 mM Tris, 100 mM NaH₂PO₄ (pH 8.5)] to a final concentration of 6 M urea. LysC (4.5 μg) was used for overnight digestion at room temperature. Samples were then diluted to 1 M urea with 50 mM ABC. Trypsin (4.5 μg) was then added, and the samples were subsequently incubated for 12 h. The digestion was then stopped by adding final concentration of 1% FA and 0.1% TFA.

Plasma protein preparation and digestion for TMT-MS analysis

Plasma undepleted of highly abundant plasma proteins

Equal volumes (2 μl of each sample) of plasma were digested with LysC and trypsin. Briefly, each sample was diluted 10-fold with 50 mM ABC, following by reduction and alkylation with 0.4 μl of 0.5 M of TCEP and 2 μl of 0.4 M CAA with heating at 90°C for 10 min. The samples were sonicated for 15 min with water bath sonication to help sample solubilization. Then, 8 M urea buffer [22.4 μl, 8 M urea, 10 mM Tris, 100 mM NaH₂PO₄ (pH 8.5)] was added to each sample after cooling to room temperature, along with LysC (10 μg). After overnight digestion, 134.4 μl of 50 mM ABC was added to each sample to dilute the urea concentration to 1 M, along with trypsin (20 μg). After 12 h, the trypsin digestion was stopped by adding final concentration of 1% FA and 0.1% TFA.

Plasma depleted of highly abundant plasma proteins

The High Select Top14 Abundant Protein Depletion Resin was also utilized for plasma samples prior to digestion. Following mixing, 500 μl of resin was aliquoted into each spin column. After the resin settled to the bottom of the spin column, 8 μL of each sample was added and depletion was performed by gentle rotation for 15 min at room temperature, followed by centrifugation at 1000×g for 2 min. Sample flow-through was concentrated with a 3K Ultra Centrifugal Filter Device by centrifugation at 14,000×g for 30 min. Immunodepleted samples were diluted to equal volumes of 75 μl with phosphate-buffered saline. Immunodepleted plasma (60 μl) was then digested with LysC and trypsin using the same protocol used for CSF depleted samples.

Isobaric TMT peptide labeling

Before TMT labeling, the digested peptides were desalted using 50 mg of Sep-Pak C18 columns (Waters). Briefly, the columns were activated with 1 mL of methanol, then equilibrated with 2 × 1 mL 0.1% TFA. The acidified samples were loaded following by washing with 2 × 1 mL 0.1% TFA. Elution was performed with 1 mL 50% acetonitrile. To normalize protein quantification across batches, global internal standard (GIS) samples were generated for each sample set by combining 100 μl aliquots from each sample elution. All individual samples and GIS pooled standards were dried by speed vacuum (Labconco).

Both depleted CSF and depleted plasma samples were divided into five TMT batches, labeled using an 11-plex TMT kit (Thermo Fisher Scientific, A34808, lot number for TMT 10-plex: SI258088, 131C channel SJ258847), and derivatized as previously described [7]. For the sample and channel distribution, please see Additional file 2: Supplementary Table 1. Five milligrams of each channel reagent was dissolved in 256 μL anhydrous acetonitrile. Each peptide sample was resuspended in 50 μl of 100 mM triethylammonium bicarbonate (TEAB) buffer, and 20.5 μl of TMT reagent solution was subsequently added. After 1 h, the reaction was quenched with 4 μl of 5% hydroxylamine (Thermo Fisher Scientific, 90115) for 15 min. The peptide solutions were then combined according to the batch arrangement. Each TMT sample was desalted with 100 mg of Sep-Pak C18 columns and dried by speed vacuum. Notably, there were 9 TMT channels used for depleted CSF samples with one GIS sample on channel 127N, whereas 10 channels were used for depleted plasma samples with two GIS samples included on both 127C and 131C channels. Channel 126 was left empty on both sample sets.

For undepleted CSF and plasma samples, the TMT 16-plex kit (Thermo Fisher Scientific, A44520, lot number VH311511) was used for labeling, which divided both CSF and plasma sample sets into 3 TMT batches with 12 samples plus 1 GIS in each batch. The sample and channel distribution were the same for CSF and plasma samples (Additional file 2: Supplementary Table 1). Five milligrams of each channel reagent was dissolved in 200 μL anhydrous acetonitrile. Each CSF peptide sample was resuspended in 50 μl of 100 mM TEAB buffer, and 10 μl of TMT reagent solution was subsequently added. For plasma samples, each peptide sample was resuspended in 150 μl of 100 mM TEAB buffer, and 30 μl of TMT reagent solution was subsequently added. The labeling was stopped after 1 h with 4 μl of 5% hydroxylamine for CSF and 12 μl of 5% hydroxylamine for plasma, and the peptide solutions were then combined according to the batch arrangement. The combined TMT samples were desalted with 100 mg of Sep-Pak C18 columns except for each undepleted plasma TMT sample, which was split and desalted using 2 × 100 mg of Sep-Pak C18 columns. The elutions were dried under speed vacuum.

High-pH off-line fractionation

CSF and plasma undepleted of highly abundant plasma proteins

Dried samples were resuspended in high pH loading buffer (0.07% vol/vol NH₄OH, 0.045% vol/vol FA, 2% vol/vol ACN) and loaded onto a Water’s BEH column (2.1 mm × 150 mm with 1.7 μm particles). A Vanquish UPLC system (Thermo Fisher Scientific) was used to carry out the fractionation. Solvent A consisted of 0.0175% (vol/vol) NH₄OH, 0.01125% (vol/vol) FA, and 2% (vol/vol) ACN; solvent B consisted of 0.0175% (vol/vol) NH₄OH, 0.01125% (vol/vol) FA, and 90% (vol/vol) ACN. The sample elution was performed over a 25-min gradient with a flow rate of 0.6 mL/min with a gradient from 0 to 50% solvent B. A total of 192 individual equal volume fractions were collected across the gradient. Fractions were concatenated to either 48 or 96 fractions and dried to completeness using vacuum centrifugation.

CSF and plasma depleted of highly abundant plasma proteins

Dried samples were resuspended in high pH loading buffer (0.07% vol/vol NH₄OH, 0.045% vol/vol FA, 2% vol/vol ACN) and loaded onto an Agilent ZORBAX 300 Extend-C18 column (2.1 mm × 150 mm with 3.5 μm beads). An Agilent 1100 HPLC system was used to carry out the fractionation. Solvent A consisted of 0.0175% (vol/vol) NH₄OH, 0.01125% (vol/vol) FA, and 2% (vol/vol) ACN; solvent B consisted of 0.0175% (vol/vol) NH₄OH, 0.01125% (vol/vol) FA, and 90% (vol/vol) ACN. The sample elution was performed over a 60 min gradient with a flow rate of 0.4 mL/min with a gradient from 0 to 60% solvent B. A total of 96 individual equal volume fractions were collected across the gradient and subsequently pooled by concatenation into 30 fractions and dried to completeness under vacuum centrifugation.

TMT mass spectrometry

CSF undepleted of highly abundant plasma proteins

For batch 1, all samples (~1μg) were loaded and eluted using a Dionex Ultimate 3000 RSLCnano (Thermo Fisher Scientific) on an in-house packed 25 cm, 100 μm internal diameter (i.d.) capillary column with 1.9 μm Reprosil-Pur C18 beads (Dr. Maisch, Ammerbuch, Germany) over a 60 min gradient. Mass spectrometry was performed with a high-field asymmetric waveform ion mobility spectrometry (FAIMS) Pro-equipped Orbitrap Eclipse (Thermo Fisher Scientific) in positive ion mode using data-dependent acquisition with 2-s top speed cycles. Each cycle consisted of one full MS scan followed by as many MS/MS events that could fit within the given 2-s cycle time limit. MS scans were collected at a resolution of 120,000 (410–1600 m/z range, 4×10^5 AGC, 50 ms maximum ion injection time, FAIMS compensation voltage of −50 and −70). All higher energy collision-induced dissociation (HCD) MS/MS spectra were acquired at a resolution of 30,000 (0.7 m/z isolation width, 35% collision energy, 1.25×10^5 AGC target, 54 ms maximum ion time, TurboTMT on). Dynamic exclusion was set to exclude previously sequenced peaks for 30 s within a 10-ppm isolation window. For batches 2 and 3, samples were eluted over a 21-min gradient. Mass spectrometry was performed the same as batch 1 except with a FAIMS compensation voltage of −45, and dynamic exclusion set to exclude previously sequenced peaks for 6 s within a 10-ppm isolation window.

Plasma undepleted of highly abundant plasma proteins

Mass spectrometry was performed the same as for CSF undepleted batches 2 and 3 except FAIMS compensation voltage was set at −40 and −60, and dynamic exclusion was set to exclude previously sequenced peaks for 20 s within a 10-ppm isolation window.

CSF and plasma depleted of highly abundant plasma proteins

All fractions (~1μg) were loaded and eluted using an Easy-nLC 1200 (Thermo Fisher Scientific) on an in-house packed 30 cm, 750 μm i.d. capillary column with 1.9 μm Reprosil-Pur C18 beads over a 120-min gradient. Mass spectrometry was performed with a Q-Exactive HFX (Thermo Fisher Scientific) in positive ion mode using data-dependent acquisition with a top 10 method. Each cycle consisted of one full MS scan followed by 10 MS/MS events. MS scans were collected at a resolution of 120,000 (400–1600 m/z range, 3×10^6 AGC, 100 ms maximum ion injection time). All higher energy collision-induced dissociation (HCD) MS/MS spectra were acquired at a resolution of 45,000 (1.6 m/z isolation width, 35% collision energy, 1×10^5 AGC target, 86 ms maximum ion time). Dynamic exclusion was set to exclude previously sequenced peaks for 20 s within a 10-ppm isolation window.

Database searches and protein quantification

All raw files were searched using Proteome Discoverer (version 2.4.1.15, Thermo Fisher Scientific) with Sequest HT. The spectra were searched against a human UniProt database downloaded April 2015 (90,411 target sequences). We used this database for consistency with our prior brain proteomics study [2]. Search parameters included 20 ppm precursor mass window, 0.05 Da product mass window, dynamic modifications methionine (+15.995 Da), deamidated asparagine and glutamine (+0.984 Da), phosphorylated serine, threonine, and tyrosine (+79.966 Da), and static modifications for carbamidomethyl cysteines (+57.021 Da) and N-terminal and lysine-tagged TMT (+229.163 or +304.207 Da depending on the dataset). Percolator was used to filter peptide spectral matches (PSMs) to 1% FDR. Peptides were grouped using strict parsimony and only razor and unique peptides were used for protein level quantitation. Reporter ions were quantified from MS2 scans using an integration tolerance of 20 ppm with the most confident centroid setting. Only unique and razor (i.e., parsimonious) peptides were considered for quantification.

Protein abundance data processing

Tandem mass tag mass spectrometry (TMT-MS)

Only proteins that were identified and summarized as high confidence (<1% FDR) by Proteome Discoverer (PD) were used for analysis. The 3730 UniProt protein identifier accessions provided by PD were further annotated with Hugo Gene Nomenclature Committee (HGNC) official gene symbols. TMT-MS data were processed identically for both CSF and plasma, including fluid depleted of highly abundant proteins. TMT reporter intensities (abundances) that had not undergone normalization by Proteome Discoverer (PD) were used for analysis to preserve inherent protein abundance differences between control and AD subjects. Four separate datasets were used for analysis: CSF undepleted, CSF depleted, plasma undepleted, and plasma depleted of highly abundant proteins. For each dataset, batch correction was performed by dividing abundances for each protein within each batch by the global internal standard (GIS). GIS measurements were then removed, and proteins with more than 75% (n=27/36) missing values were excluded from consideration. The number of remaining protein isoforms after missing value control was 1128 in undepleted plasma, 2229 in undepleted CSF, 1385 in plasma depleted of highly abundant proteins, and 2944 in CSF depleted of highly abundant proteins.

Olink proximity extension assay (PEA) and SomaLogic SomaScan assay

Olink NPX values were analyzed using the OlinkAnalyze R package v1.2.1. NPX values that were flagged with quality control (QC) warnings were removed from further consideration. SomaScan RFU data and assay metadata for CSF and plasma were analyzed using the SomaDataIO R package v3.1.0. Olink NPX and SomaLogic SomaScan data included blank buffer replicate measurements (noise). Buffer measurements were used to calculate signal-to-noise (S:N) ratios for both unnormalized NPX or RFU abundance data. S:N ratios were calculated by subtracting the within-assay median buffer signal from the unlogged assay signal (2^NPX or RFU), then dividing by the median buffer signal. Protein assay-specific limit of detection (LOD) was defined as median log₂ buffer signal plus 3 standard deviations (SD) of the assay’s buffer measurements. NPX background signal SD for PEA was defined from historically recorded background variance of the assays and included as a component of predetermined LOD, whereas SomaScan background SD was calculated from available buffer replicate data for the assays performed. Sample measurements in CSF or plasma that were below LOD were retained but considered as missing values. Olink repeated measurements of the same UniProt protein assayed in different panels (N=36 duplicated assays) were reduced to their representative single best replicate of the assay in one of the panels based on criteria including the highest pairwise correlation to other replicate assays, highest signal, and largest dynamic range.

Both platform assays underwent a first-pass filter allowing up to 75% (n=27/36) missing values. Because SomaScan CSF data had low signal, a second-pass filter step was applied to remove assays that did not meet an empirically derived S:N threshold. This threshold was determined by correlating SomaScan assays with Olink and TMT-MS undepleted assays at varying S:N cutoff values (0, 0.15, 0.25, 0.3, 0.35, 0.4, 0.45, 0.50, 0.625, 0.75, 1, 2, 4, and 8), and selecting the S:N value (≥0.45) that maximized median Pearson correlation with the other platforms. After application of first- and second-pass filters and removal of control aptamers, 3594 CSF and 7284 plasma human SomaScan assays were kept for subsequent analyses. For Olink assays, after applying the first-pass filter, 902 CSF and 1140 plasma proteins were kept for subsequent analyses.

Proteome coverage overlap, ontology enrichment, and missing data analysis

Unique gene symbols measured in each platform were counted, and overlap was visualized using the venneuler R package (v1.1-0) venneuler function. Enrichment of gene ontologies (GO) in different Venn categories was calculated as Fisher’s exact test p value transformed to z score using GO-Elite (v1.2.5) and visualized using a custom in-house R script. The same procedure was used to determine ontology enrichment for network modules. Missing data (Additional file 1: Supplementary Figure 2) in Olink and SomaScan included assays flagged by QC warnings, below LOD, and truly missing measurements. Missing data in TMT-MS was considered at the level of batch, as all measurements within a batch result from the same MS/MS fragmentation.

Censoring of proteins affected by depletion of highly abundant proteins

Proteins considered for analysis were those measured by TMT-MS before and after depletion of highly abundant proteins that had the same UniProt accession, at least 9 paired abundance measurements, and at least 3 measurements per case status group (AD or control; n=1932 CSF proteins, n=852 plasma proteins). Pairwise measurements were correlated using Pearson correlation on the difference in abundance between AD versus control subjects. Proteins that were discordant in their differential abundance, as well as proteins with negative Pearson rho across depleted and undepleted matched protein measurements across the 36 case samples, were considered significantly affected by depletion. In total, 32 proteins in CSF and 27 in plasma were censored from the TMT-MS depleted data due to effects of depletion on their abundance levels (Additional file 2: Supplementary Table 3).

Protein abundance correlation analysis

Proteins measured in common across two platforms within the same biofluid were correlated across all samples using the corAndPvalue function in the WGCNA R package (v1.69) (Additional file 2: Supplementary Tables 12-17, Additional file 1: Extended Data). In the case of multiple SomaScan assays for the same protein, the assay with the identical UniProt protein accession, or secondarily, a SOMAmer measuring an identical gene product, was selected. When multiple cross-platform UniProt accession or gene symbol matches occurred, the SOMAmer with the highest correlation was selected. We constructed a population histogram of all Pearson correlations for distinct gene products or UniProt accessions (representing distinct protein isoforms) and identified the median rho for each population of paired measurements between two platforms (Fig. 3).

Cumulative signal and total protein abundance comparison

Mean NPX, RFU, or ion intensity signal without prior normalization or log transformation for each protein across the 36 samples was ranked for each platform and biofluid from highest to lowest abundance. Curves of incremental median cumulative abundance were constructed for AD and control groups (Additional file 1: Supplementary Figure 8, left uncalibrated panels). Absolute protein abundance differences in plasma were also assessed by calibrating relative platform signals to absolute plasma protein concentrations (n=4226) as provided in the Human Protein Atlas (HPA) on March 5, 2022 (https://www.proteinatlas.org/search). For the calibrated abundance calculations, unlogged abundance (signal minus buffer median, including positive values below LOD) was calibrated so that the geometric mean of control group measurements was set to the known absolute blood concentration with all individual measurements varying relative to this value. Missing values were considered as one-half the minimum assayed nonzero signal for geometric mean calculations. When multiple assays for a gene product were available within a platform, the assay with maximum mean signal was selected for calibration. HPA-calibrated values were plotted for all 4226 proteins regardless of presence in the platform as a ranked absolute abundance curve (non-cumulative, black trace in Additional file 1: Supplementary Figure 8B, D, F, and H). Then, the cumulative log₁₀-transformed abundance of all lesser abundant ranked proteins up to each represented rank of any protein measured within the platform was plotted as the median such value for AD or control (Additional file 1: Supplementary Figure 8). In contrast to the uncalibrated cumulative abundance curves, the value at each point in these left-truncated cumulative sum curves represents the sum (cumulative) abundance of only lesser abundant proteins up to that rank, and not those ranked with higher abundance.

Differential expression analysis

Differences between AD and control were assessed on the log₂(abundance) measurements over all proteins after data processing as described above, which included signal cleanup, filtering on missingness, removal of proteins affected by top-14 highly abundant protein depletion, and, in the case of SomaScan CSF data, control of excessively low S:N assays. Volcano plots were made using a custom in-house script via the plotly (v4.9.2.1) R package function ggplotly. Individual volcano points were colored by membership in the 44 brain network modules described in Johnson et al. [2].

Comparison to external datasets

External Olink datasets used for correlation included plasma AD effect sizes from a Hong Kong-based cohort provided in Jiang et al. [23] Appendix Table 1, from the BioFinder cohort described in Whelan et al. [24] Table S19, and from the Accelerating Medicines Partnership for Parkinson’s Disease (AMP-PD) 2021 v2-5 (May 10) release. The external SomaScan dataset was obtained from the ANMerge version of the AddNeuroMed study data as described in Birkenbihl et al. [25]. Only sample data collected at the last visit in the AMP-PD and ANMerge datasets were used for correlations.

Data provided in Jiang et al. and Whelan et al. was used directly for correlation without additional processing. AMP-PD Olink raw data from 212 study participants and all four 384-assay panels available (Cardiometabolic, Inflammation, Neurology, and Oncology) was loaded and processed in the same fashion as Emory Olink data. Four participants with a diagnosis of multiple systems atrophy were excluded. Final number of subjects analyzed was 118 PD and 90 control. As described above, values below LOD were censored as missing but otherwise retained for correlations, and proteins duplicated in the different panels were reduced to one representative assay. Final assay numbers included 1054 CSF and 1398 plasma assays. Log₂ fold change values that remained significant after FDR correction were correlated with log₂ fold change values in the current study using Pearson correlation and Student’s p values as implemented in the WGCNA package verboseScatterplot function.

Harmonization of platform protein abundance prior to network analysis

TMT-MS ion counts in fluid depleted of highly abundant proteins, SomaScan RFUs, and Olink unlogged NPX values, totaling 9589 assays in plasma and 7158 assays in CSF, were assembled for the 35 case samples commonly measured on all three platforms. Only truly missing values were considered as unavailable; values below LOD or those subject to S:N threshold-based filtering were retained. Data were transposed prior to removal of platform-specific effects as a batch effect using the TAMPOR algorithm. TAMPOR has been described previously for removal of nuisance batch effects in proteomic data and harmonization of different cohorts [2, 3]. As applied to TMT data, TAMPOR removes batch effects from TMT reporter abundance by using a ratio of TMT reporter signal over the GIS signal within batch. Because the GIS represents the all-sample average, the ratios tend to be near unity, and log2(abundance/GIS) tends to be zero. TAMPOR further enforces the central tendency towards zero for both proteins and sample medians by performing a median polish of the log2 ratios [47]. Output from this step is centered at a log2(abundance) of zero for both proteins and samples (rows and columns). TAMPOR retains the relative protein abundances as row-wise medians, which can be inserted back into the data after completion of TAMPOR to restore the data to the original form as that used as input for median polish. This alternative version of the data (“clean relative abundance”) is all positive, in the same form as TMT reporter intensity or abundance.

To harmonize measurements across different proteomic platforms, we employed an adaptation of the TAMPOR algorithm (termed “transposed” TAMPOR) applied to relative abundance. SomaScan RFU, Olink 2^NPX, and TMT reporter abundance ratios were harmonized by considering common proteins across the different platforms as anchoring measurements. The central tendencies (intra-platform) of these common protein measurements are considered as robust as GIS measures within platform and are then normalized across platforms with the same median polish algorithm. The algorithm is therefore applied to samples as though they were proteins, and the proteins as though they were samples, in contrast to the standard TAMPOR algorithm described for batch correction in TMT data.

Common proteins measured across all three platforms were used as the GIS (n=101 in plasma and n=201 in CSF) to calculate the central tendency of data within and across platforms used for the denominators in the TAMPOR algorithm, as previously described [2]. Normalized data used in subsequent network analyses was of the form log₂(abundance/central tendency) of the common proteins in all platforms. No missing values were present in the protein abundances used for network construction.

Protein co-expression network analysis

Networks for CSF and plasma were constructed using the harmonized protein abundances for each biofluid. The Weighted Correlation Network Analysis (WGCNA) algorithm (v1.69) was used for network generation. No outliers were detected using the WGCNA sample network connectivity outlier algorithm. The WGCNA blockwiseModules function was run on the CSF and plasma harmonized abundances with the following parameters: power=6.5 (CSF) or 11 (plasma), deepSplit=4, minModuleSize=10, mergeCutHeight=0.07, TOMdenom=“mean”, bicor correlation, signed network type, PAM staging and PAM respects dendro as TRUE, and a maxBlockSize larger than the total number of protein assays. Module memberships were then iteratively reassigned to enforce kME table consistency, as previously described [3]. The resulting network assignments were visualized as modules using the iGraph(v1.2.5) package. Module eigenprotein correlations and significance were visualized in circular heatmaps using the circlize (v0.4.10), dendextend (v1.13.4), and dendsort (0.3.3) R packages. Synthetic eigenproteins for each network (CSF, plasma, and brain [2]) were calculated as previously described [3]. For synthetic eigenproteins translated either from or to the brain, the existing data for 8619 proteins underlying the brain network were mapped to labels in the biofluid network using a mapping rubric to cross-reference protein labels. Specifically [1], an exact Uniprot ID match to that in labels of the form Symbol|UniprotID|platform|biofluid took precedence for labels with MS as the platform, followed by [2] symbol matches with MS as the platform. This was followed by [3] an exact Uniprot ID match to an Olink row in a biofluid dataset, and then [4] an exact Uniprot ID to a SomaScan row, followed by [5] a symbol match with Olink as the platform, and finally [6], a symbol match with SomaScan as the platform. In this way, unmatched proteins across pairs of networks were minimized. The same 6-point rubric was used for matching (relabeling) brain network member labels before performing module preservation (below).

Network module overlap

Over-representation analysis (ORA) of module gene symbols between networks was determined using two-tailed Fisher’s exact test, followed by correction of p values for multiple testing using the Benjamini-Hochberg method. The plasma network was compared to a SomaScan human serum network constructed from 4137 proteins as described in Emilsson et al. [32] Serum network assignments in 27 modules plus grey were curated from Table S7 in Emilsson et al. [32]. Overlap of module gene symbols between the two networks was determined as described above. Overlap was visualized using a custom in-house script.

Network preservation

Pairwise, directional preservation between CSF and plasma, plasma and CSF, and brain to each of the biofluid networks and vice versa was performed using the WGCNA (v1.69) modulePreservation function with 500 permutations after harmonizing protein assay labels as described above. Z_summary composite z score for 8 underlying network parameters was calculated and visualized by circular heatmap as significance (minus log₁₀(Benjamini-Hochberg adjusted p values), corresponding to the Z_summary scores obtained.

Cell type marker enrichment analyses

Cell type-specific enriched marker gene symbol lists were used as previously published to perform Fisher’s exact one-tailed test for enrichment [2]. Benjamini-Hochberg correction was applied to all resulting p values.

Other statistics

All statistical analyses were performed in R (v4.0.2). Boxplots represent the median, 25th, and 75th percentile extremes; thus, hinges of a box represent the interquartile range of the two middle quartiles of data within a group. The farthest data points up to 1.5 times the interquartile range away from box hinges define the extent of whiskers (error bars). Correlations were performed using the biweight midcorrelation function as implemented in the WGCNA R package or Pearson correlation. Comparisons between two groups were performed by a two-sided t test. Comparisons among three or more groups were performed with Kruskal-Wallis nonparametric ANOVA or standard ANOVA with Tukey or post hoc pairwise comparison of significance. P values were adjusted for multiple comparisons by false discovery rate (FDR) correction according to the Benjamini-Hochberg method where indicated. Z score conversion of normalized protein data and normalized protein eigenproteins or synthetic eigenproteins were calculated as fold of standard deviation from the mean. At n=36, we were powered to detect a correlation rho of 0.45 at 80% power and p=0.05. For pairwise comparisons between AD and control groups considering all proteins measured, we were powered to detect a fold change of ~1.2 to ~1.3 depending on platform and fluid based on the power calculation method described by Bi and Liu [48].

Availability of data and materials

Raw data, case traits, and analyses related to this manuscript are available at https://www.synapse.org/3platformEmory. Code available in the research compendium for the current study is available from https://www.synapse.org/3platformEmory. The algorithm used for batch correction is fully documented and available as an R function, which can be downloaded from https://github.com/edammer/TAMPOR. Data used in the preparation of this article were obtained from the Accelerating Medicine Partnership® (AMP®) Parkinson’s Disease (AMP-PD) Knowledge Platform. For up-to-date information on the study, visit https://www.amp-pd.org. The AMP® PD program is a public-private partnership managed by the Foundation for the National Institutes of Health and funded by the National Institute of Neurological Disorders and Stroke (NINDS) in partnership with the Aligning Science Across Parkinson’s (ASAP) initiative; Celgene Corporation, a subsidiary of Bristol Myers Squibb Company; GlaxoSmithKline plc (GSK); The Michael J. Fox Foundation for Parkinson's Research; Pfizer Inc.; Sanofi US Services Inc.; and Verily Life Sciences. AMP-PD clinical data and biosamples used in preparation of this article were obtained from the Parkinson’s Progression Markers Initiative (PPMI) and the NINDS Parkinson’s Disease Biomarkers Program (PDBP). PPMI is sponsored by The Michael J. Fox Foundation for Parkinson’s Research and supported by a consortium of scientific partners: 4D Pharma, AbbVie Inc., AcureX Therapeutics, Allergan, Amathus Therapeutics, Aligning Science Across Parkinson’s (ASAP), Avid Radiopharmaceuticals, Bial Biotech, Biogen, BioLegend, Bristol Myers Squibb, Calico Life Sciences LLC, Celgene Corporation, DaCapo Brainscience, Denali Therapeutics, The Edmond J. Safra Foundation, Eli Lilly and Company, GE Healthcare, GlaxoSmithKline, Golub Capital, Handl Therapeutics, Insitro, Janssen Pharmaceuticals, Lundbeck, Merck & Co., Inc., Meso Scale Diagnostics, LLC, Neurocrine Biosciences, Pfizer Inc., Piramal Imaging, Prevail Therapeutics, F. Hoffmann-La Roche Ltd and its affiliated company Genentech Inc., Sanofi Genzyme, Servier, Takeda Pharmaceutical Company, Teva Neuroscience, Inc., UCB, Vanqua Bio, Verily Life Sciences, Voyager Therapeutics, Inc., and Yumanity Therapeutics, Inc. The PPMI investigators have not participated in reviewing the data analysis or content of the manuscript. For up-to-date information on the study, visit www.ppmi-info.org. The PDBP consortium is supported by the National Institute of Neurological Disorders and Stroke (NINDS) at the National Institutes of Health. A full list of PDBP investigators can be found at https://pdbp.ninds.nih.gov/policy. The PDBP investigators have not participated in reviewing the data analysis or content of the manuscript.

ACCELERATING MEDICINES PARTNERSHIP and AMP are registered service marks of the U.S. Department of Health and Human Services. Absolute quantitative plasma protein data was obtained from the Human Protein Atlas at proteinatlas.org.

References

Beckmann ND, et al. Multiscale causal networks identify VGF as a key regulator of Alzheimer's disease. Nat Commun. 2020;11(1):3942. https://doi.org/10.1038/s41467-020-17405-z.
Johnson ECB, et al. Large-scale deep multi-layer analysis of Alzheimer's disease brain reveals strong proteomic disease-related changes not observed at the RNA level. Nat Neurosci. 2022;25(2):213–25. https://doi.org/10.1038/s41593-021-00999-y.
Johnson ECB, et al. Large-scale proteomic analysis of Alzheimer's disease brain and cerebrospinal fluid reveals early changes in energy metabolism associated with microglia and astrocyte activation. Nat Med. 2020;26(5):769–80. https://doi.org/10.1038/s41591-020-0815-6.
Mostafavi S, et al. A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer's disease. Nat Neurosci. 2018;21(6):811–9. https://doi.org/10.1038/s41593-018-0154-9.
Wan YW, et al. Meta-analysis of the Alzheimer's disease human brain transcriptome and functional dissection in mouse models. Cell Rep. 2020;32(2):107908. https://doi.org/10.1016/j.celrep.2020.107908.
Bader JM, et al. Proteome profiling in cerebrospinal fluid reveals novel biomarkers of Alzheimer's disease. Mol Syst Biol. 2020;16(6):e9356. https://doi.org/10.15252/msb.20199356.
Higginbotham L, et al. Integrated proteomics reveals brain-based cerebrospinal fluid biomarkers in asymptomatic and symptomatic Alzheimer's disease. Sci Adv. 2020;6(43). https://doi.org/10.1126/sciadv.aaz9360.
Dayon L, et al. Proteomes of paired human cerebrospinal fluid and plasma: relation to blood-brain barrier permeability in older adults. J Proteome Res. 2019;18(3):1162–74. https://doi.org/10.1021/acs.jproteome.8b00809.
Li KW, Gonzalez-Lozano MA, Koopmans F, Smit AB. Recent developments in data independent acquisition (DIA) mass spectrometry: application of quantitative analysis of the brain proteome. Front Mol Neurosci. 2020;13:564446. https://doi.org/10.3389/fnmol.2020.564446.
Brenes A, et al. Multibatch TMT reveals false positives, batch effects and missing values. Mol Cell Proteomics. 2019;18(10):1967–80. https://doi.org/10.1074/mcp.RA119.001472.
Johnson ECB, et al. Deep proteomic network analysis of Alzheimer's disease brain reveals alterations in RNA binding proteins and RNA splicing associated with disease. Mol Neurodegener. 2018;13(1):52. https://doi.org/10.1186/s13024-018-0282-4.
Geyer PE, et al. Revisiting biomarker discovery by plasma proteomics. Mol Syst Biol. 2017;13(9):942. https://doi.org/10.15252/msb.20156297.
Gold L, et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS One. 2010;5(12):e15004. https://doi.org/10.1371/journal.pone.0015004.
Tin A, et al. Reproducibility and variability of protein analytes measured using a multiplexed modified aptamer assay. J Appl Lab Med. 2019;4(1):30–9. https://doi.org/10.1373/jalm.2018.027086.
Walker KA, et al. Large-scale plasma proteomic analysis identifies proteins and pathways associated with dementia risk. Nature Aging. 2021;1(5):473–89. https://doi.org/10.1038/s43587-021-00064-0.
Assarsson E, et al. Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability. PLoS One. 2014;9(4):e95192. https://doi.org/10.1371/journal.pone.0095192.
Lundberg M, Eriksson A, Tran B, Assarsson E, Fredriksson S. Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res. 2011;39(15):e102. https://doi.org/10.1093/nar/gkr424.
Pietzner M, et al. Synergistic insights into human health from aptamer- and antibody-based proteomic profiling. Nat Commun. 2021;12(1):6822. https://doi.org/10.1038/s41467-021-27164-0.
Raffield LM, et al. Comparison of proteomic assessment methods in multiple cohort studies. Proteomics. 2020;20(12):e1900278. https://doi.org/10.1002/pmic.201900278.
Katz DH, et al. Whole genome sequence analysis of the plasma proteome in black adults provides novel insights into cardiovascular disease. Circulation. 2022;145(5):357–70. https://doi.org/10.1161/CIRCULATIONAHA.121.055117.
Finkernagel F, et al. Dual-platform affinity proteomics identifies links between the recurrence of ovarian carcinoma and proteins released into the tumor microenvironment. Theranostics. 2019;9(22):6601–17. https://doi.org/10.7150/thno.37549.
Graumann J, et al. Multi-platform affinity proteomics identify proteins linked to metastasis and immune suppression in ovarian cancer plasma. Front Oncol. 2019;9:1150. https://doi.org/10.3389/fonc.2019.01150.
Jiang Y, et al. Large-scale plasma proteomic profiling identifies a high-performance biomarker panel for Alzheimer's disease screening and staging. Alzheimers Dement. 2021;18(1):88-102. https://doi.org/10.1002/alz.12369.
Whelan CD, et al. Multiplex proteomics identifies novel CSF and plasma biomarkers of early Alzheimer's disease. Acta Neuropathol Commun. 2019;7(1):169. https://doi.org/10.1186/s40478-019-0795-2.
Birkenbihl C, et al. ANMerge: a comprehensive and accessible Alzheimer's disease patient-level dataset. J Alzheimers Dis. 2021;79(1):423–31. https://doi.org/10.3233/JAD-200948.
Weiner S, et al. Optimized sample preparation and data analysis for TMT proteomic analysis of cerebrospinal fluid applied to the identification of Alzheimer's disease biomarkers. Clin Proteomics. 2022;19(1):13. https://doi.org/10.1186/s12014-022-09354-0.
Andreasen N, et al. Cerebrospinal fluid levels of total-tau, phospho-tau and a beta 42 predicts development of Alzheimer's disease in patients with mild cognitive impairment. Acta Neurol Scand Suppl. 2003;179:47–51. https://doi.org/10.1034/j.1600-0404.107.s179.9.x.
Uhlen M, et al. The human secretome. Sci Signal. 2019;12(609). https://doi.org/10.1126/scisignal.aaz0274.
Lehallier B, et al. Undulating changes in human plasma proteome profiles across the lifespan. Nat Med. 2019;25(12):1843–50. https://doi.org/10.1038/s41591-019-0673-2.
Tanaka T, et al. Plasma proteomic biomarker signature of age predicts health and life span. Elife. 2020;9. https://doi.org/10.7554/eLife.61073.
Benkert P, et al. Serum neurofilament light chain for individual prognostication of disease activity in people with multiple sclerosis: a retrospective modelling and validation study. Lancet Neurol. 2022;21(3):246–57. https://doi.org/10.1016/S1474-4422(22)00009-6.
Emilsson V, et al. Co-regulatory networks of human serum proteins link genetics to disease. Science. 2018;361(6404):769–73. https://doi.org/10.1126/science.aaq1327.
Sun BB, et al. Genomic atlas of the human plasma proteome. Nature. 2018;558(7708):73–9. https://doi.org/10.1038/s41586-018-0175-2.
Wang H, et al. Integrated analysis of ultra-deep proteomes in cortex, cerebrospinal fluid and serum reveals a mitochondrial signature in Alzheimer's disease. Mol Neurodegener. 2020;15(1):43. https://doi.org/10.1186/s13024-020-00384-6.
Stein KC, et al. Ageing exacerbates ribosome pausing to disrupt cotranslational proteostasis. Nature. 2022;601(7894):637–42. https://doi.org/10.1038/s41586-021-04295-4.
Sweeney MD, Sagare AP, Zlokovic BV. Blood-brain barrier breakdown in Alzheimer disease and other neurodegenerative disorders. Nat Rev Neurol. 2018;14(3):133–50. https://doi.org/10.1038/nrneurol.2017.188.
Dayon L, et al. Alzheimer disease pathology and the cerebrospinal fluid proteome. Alzheimers Res Ther. 2018;10(1):66. https://doi.org/10.1186/s13195-018-0397-4.
Bai B, et al. Deep multilayer brain proteomics identifies molecular networks in Alzheimer's disease progression. Neuron. 2020;105(6):975–991 e7. https://doi.org/10.1016/j.neuron.2019.12.015.
Nelson PT, et al. Correlation of Alzheimer disease neuropathologic changes with cognitive status: a review of the literature. J Neuropathol Exp Neurol. 2012;71(5):362–81. https://doi.org/10.1097/NEN.0b013e31825018f7.
Chen M, Xia W. Proteomic profiling of plasma and brain tissue from Alzheimer's disease patients reveals candidate network of plasma biomarkers. J Alzheimers Dis. 2020;76(1):349–68. https://doi.org/10.3233/JAD-200110.
Mawuenyega KG, et al. Decreased clearance of CNS beta-amyloid in Alzheimer's disease. Science. 2010;330(6012):1774. https://doi.org/10.1126/science.1197623.
De Miguel Z, et al. Exercise plasma boosts memory and dampens brain inflammation via clusterin. Nature. 2021;600(7889):494–9. https://doi.org/10.1038/s41586-021-04183-x.
Oldham MC. Transcriptomics: from differential expression to coexpression. In: Coppola G, editor. The OMICs: applications in neuroscience; 2014. p. 85–113.
Olsson A, et al. Simultaneous measurement of beta-amyloid(1-42), total tau, and phosphorylated tau (Thr181) in cerebrospinal fluid by the xMAP technology. Clin Chem. 2005;51(2):336–45. https://doi.org/10.1373/clinchem.2004.039347.
Hulstaert F, et al. Improved discrimination of AD patients using beta-amyloid(1-42) and tau levels in CSF. Neurology. 1999;52(8):1555–62. https://doi.org/10.1212/wnl.52.8.1555.
Shaw LM, et al. Cerebrospinal fluid biomarker signature in Alzheimer's disease neuroimaging initiative subjects. Ann Neurol. 2009;65(4):403–13. https://doi.org/10.1002/ana.21610.
Tukey JW. Exploratory data analysis; 1977.
Google Scholar
Bi R, Liu P. Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments. BMC Bioinformatics. 2016;17:146. https://doi.org/10.1186/s12859-016-0994-9.

Download references

Acknowledgements

We are grateful to those who agreed to donate their CSF and blood for this study. The authors would like to thank SomaLogic and Olink research customer support teams for their consultations on data analysis.

Funding

This study was supported by the following National Institutes of Health funding mechanisms: K08AG068604, P30AG066511, and U54AG065187.

Author information

Authors and Affiliations

Goizueta Alzheimer’s Disease Research Center, Emory University School of Medicine, Whitehead Building—Suite 505C, 615 Michael Street, Atlanta, GA, 30322, USA
Eric B. Dammer, Lingyan Ping, Duc M. Duong, Erica S. Modeste, Nicholas T. Seyfried, James J. Lah, Allan I. Levey & Erik C. B. Johnson
Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, USA
Eric B. Dammer, Lingyan Ping, Duc M. Duong, Erica S. Modeste & Nicholas T. Seyfried
Department of Neurology, Emory University School of Medicine, Atlanta, GA, USA
Lingyan Ping, Nicholas T. Seyfried, James J. Lah, Allan I. Levey & Erik C. B. Johnson

Authors

Eric B. Dammer
View author publications
You can also search for this author in PubMed Google Scholar
Lingyan Ping
View author publications
You can also search for this author in PubMed Google Scholar
Duc M. Duong
View author publications
You can also search for this author in PubMed Google Scholar
Erica S. Modeste
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas T. Seyfried
View author publications
You can also search for this author in PubMed Google Scholar
James J. Lah
View author publications
You can also search for this author in PubMed Google Scholar
Allan I. Levey
View author publications
You can also search for this author in PubMed Google Scholar
Erik C. B. Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ECBJ, EBD, LP, DMD, and JJL designed the experiments; LP and DMD carried out experiments; EBD and ECBJ analyzed data; LP, DMD, ESM, NTS, JJL, and AIL provided advice on the interpretation of data; ECBJ wrote the manuscript with input from coauthors. All author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Erik C. B. Johnson.

Ethics declarations

Ethics approval and consent to participate

All Emory research participants provided informed consent for this study under protocols approved by the Institutional Review Board at Emory University, in accordance with the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary Figure 1.

Signal-to-Noise Analysis in SomaScan. (A-D) Signal-to-noise was analyzed in the SomaScan 7000 data in CSF and plasma. (A) The frequency of median relative fluorescence units (RFU) for each protein assay in CSF across n=36 subjects and the frequency of median RFU for the buffer control of each aptamer reagent (left), with frequency of calculated signal:noise ratio for each aptamer reagent (right). (B) The same analysis as shown in (A), except in plasma. (C) The median correlation across all proteins commonly measured in the SomaScan 7000 platform with those in the Olink 1196 and TMT-MS platforms at a given minimum SomaScan signal:noise threshold, in CSF (left) and plasma (right). A signal:noise (S:N) ratio of 0.45 is indicated on the curves (noise is subtracted from signal prior to ratio calculation, and therefore S:N is <1). (D) Frequency of aptamer median S:N in CSF and plasma prior to (left) and after (right) applying a S:N filter of 0.45. Total aptamer numbers in each fluid before and after applying the filter are provided. Numbers include both human and non-human assays but no control probes, with a <75% missingness threshold by limit of detection applied prior to S:N filter. (Insets) S:N from 0-10. Supplementary Figure 2. Missing Values. (A, B) The number of quantified protein assays by percent missingness across n=36 samples in CSF and plasma for the targeted SomaScan and Olink platforms, and the number of quantified proteins across TMT batches in the TMT-MS depleted and undepleted analyses (A). The SomaScan analysis includes application of the signal-to-noise (S:N) filter in CSF. “All” in each panel indicates the total number of human measurements for each platform and analysis, including assays that fall below S:N threshold in SomaScan, or do not pass QC criteria in Olink, or proteins that are identified but not quantified by reporter ions in MS (n=7288 total SomaScan human assays; n=1160 total Olink assays; n=1602 plasma, n=3310 CSF total proteins identified in MS depleted fluid; n=1129 plasma, n=2229 CSF total proteins identified in MS undepleted fluid). (B) The number of proteins removed after applying a <75% missingness filter. The SomaScan analysis also includes application of the S:N filter in CSF. Supplementary Figure 3. Effects of Highly Abundant Protein Depletion. (A-C) Frequency distribution of the correlation of TMT-MS measurements before and after depletion of the top 14 most highly abundant plasma proteins, in CSF (left) and plasma (right) (A). The vertical red line indicates the median correlation across all measurements after depletion. (B) Correlation of proteins between depleted and undepleted CSF, considering all proteins (left), proteins that are significantly different between AD and control in the depleted analysis and that have a corresponding measurement in the undepleted analysis (n=59, center), and proteins that are significantly different between AD and control in the undepleted analysis and that have a corresponding measurement in the depleted analysis (n=147, right). The individual proteins are colored according to whether they are correlated across paired measurements of the same subjects in depleted and undepleted measurements. (C) Same as in (B), except in plasma. Correlations were performed by Pearson test. Supplementary Figure 4. Proteomic Coverage with Undepleted Fluid and Unique Ontology Coverage. (A-C) Number and overlap of proteins measured by TMT-MS in undepleted fluid, Olink 1196, and SomaScan 7000 platforms in CSF (left) and plasma (right) from n=36 subjects. The threshold for inclusion was measurement in at least 9 subjects (or missing values <75%). CSF measurements on the SomaScan 7000 platform underwent signal-to-noise filtering (Supplementary Figure 1) prior to subsequent analyses. (B) Number and overlap of proteins measured in CSF and plasma by TMT-MS in undepleted fluid. (C) Gene ontology of proteins uniquely measured in each platform in CSF and plasma as shown in Fig. 2. The vertical red line indicates significance at a z score of 1.96. Supplementary Figure 5. Correlation of AD Minus Control Differences Between Platforms Across Subjects. (A) Correlation between Olink and SomaScan (left) and Olink and MS (right) AD minus control (AD–CT) values, in plasma (top) and CSF (bottom). Correlations are provided for all AD–CT values (“All”), and AD–CT values in Olink that are significant at a level of p<0.05 (“p<0.05”). The individual proteins are colored according to whether they are significantly correlated within subject between platforms as shown in Fig. 3. (B) Same as in (A), except with correlation between SomaScan and Olink (left) and SomaScan and MS (right) AD–CT values. (C) Same as in (A) and (B), except with correlation between MS and Olink (left) and MS and SomaScan (right). The number of significantly differentially expressed proteins used for correlation is provided in the respective panel. Correlations were performed with Pearson test. The green lines indicate lines of best fit for each dataset. Supplementary Figure 6. Correlation of CSF and Plasma Measurements with Other Cohorts. (A-D) Correlation of Emory cohort SomaLogic, Olink, and MS data with SomaLogic and Olink data from other cohorts. Only proteins considered significantly different between AD and control cases at 5% FDR in each cohort were considered for correlation. (A) Emory data from each platform was correlated with Olink plasma data from Jiang et al. [23] (B) Correlation with CSF Olink data from the BioFinder cohort as described in Whelan et al. [24] (C) Correlation with plasma Olink data from Whelan et al. (D) Correlation with plasma SomaLogic data from the ANMerge version of the AddNeuroMed dataset as described in Birkenbihl et al. [25] Proteins are colored by the brain co-expression module in which they reside, as described in Johnson et al. [2]. Supplementary Figure 7. Differential Protein Abundance in AD by Undepleted TMT-MS. Differential protein abundance between AD and control cases in CSF (left) and plasma (right) on the TMT-MS undepleted platform. Proteins that are above the dashed red line are significantly altered in AD by t test at p<0.05. Proteins are colored by the brain co-expression module in which they reside, as described in Johnson et al. [2]. Supplementary Figure 8. Protein Abundance Analysis By Platform in CSF and Plasma. (A-H) Individual and total protein signal levels as measures of relative abundance in CSF and plasma were analyzed on the SomaScan (A, B), Olink (C, D), undepleted tandem mass tag mass spectrometry (TMT-MS) (E, F), and depleted TMT-MS platforms (G, H) in control (n=18) and AD (n=18) subjects. (A) Aptamer relative fluorescence units (RFUs) were ranked and analyzed by the contribution of each aptamer RFU to the cumulative RFU in CSF (left) or plasma (right). The difference in RFUs between AD and control cases for the top 5% protein RFUs (shaded box) and all protein RFUs for each fluid are shown in the boxplots below. Background signal was subtracted for each RFU. RFUs below limit of detection (LOD) were not considered. (B) SomaScan aptamer RFUs were calibrated to proteins of known absolute plasma concentration from the Human Protein Atlas [28] (n=2685 out of 4226 proteins in the HPA), and proteins ranked by concentration. The black line indicates the absolute concentration values from the HPA and is not summed, whereas the turquoise and red lines indicate the summed total protein concentration for all proteins below rank in control and AD cases, respectively. Boxplots represent summed concentration in control and AD cases for the top 5% (>9.5 log₁₀ pg/L) and bottom 50% (<6.0 log₁₀ pg/L) of proteins measured that overlap with HPA proteins. (C) Olink unlogged normalized protein expression (NPX) values were ranked and analyzed in CSF (left) and plasma (right) in similar fashion as shown in (A). Background signal was subtracted from unlogged NPX values. NPX values below LOD were not considered. (D) Olink NPX values were calibrated to proteins of known absolute plasma concentration and ranked, as described in (B) (n=808 out of 4226 proteins in the HPA). Boxplots represent summed concentration in control and AD cases for the top 5% (>9.5 log₁₀ pg/L) and bottom 50% (<6.2 log₁₀ pg/L) of proteins measured that overlap with HPA proteins. (E) Undepleted TMT mass spectrometry summed reporter ion counts for each protein were ranked and analyzed in CSF (left) and plasma (right) in similar fashion as shown in (A) and (C). (F) Undepleted TMT-MS ion counts were calibrated to proteins of known absolute plasma concentration and ranked, as described in (B) and (D) (n=925 out of 4226 proteins in the HPA). The red shaded area represents the top 10% most abundant proteins, whereas the blue shaded area represents the bottom 30% of abundance at a threshold of <6.2 pg/L in the undepleted MS data. (G) Depleted TMT-MS ion counts were analyzed as described in (E). (H) Depleted TMT-MS ion counts were analyzed as described in (F) (n=1163 out of 4226 proteins in the HPA). The blue shaded area represents the bottom 30% of abundance at a threshold of <6.2 pg/L in the depleted MS data. Shaded areas represent +/- SEM. Differences between control and AD were determined by t test. Supplementary Figure 9. SMOC1 Analyses. (A-E) Correlation of SMOC1 relative abundance in CSF between proteomic platforms (A). (B) Correlation of SMOC1 relative abundance in plasma between Olink and SomaScan platforms. SMOC1 was not measured by MS in plasma. Correlations in (A) and (B) were performed with the SMOC1 5694.57 SOMAmer. (C) Correlation of SMOC1 relative abundance as measured by Olink with CSF Aβ/T-Tau ratio in CSF (left) and plasma (right). (D) Correlation of SMOC1 relative abundance with MoCA score, a measure of cognitive performance (higher scores reflect better cognitive performance), in CSF (left) and plasma (right). (E) Correlation of SMOC1 relative abundance with MoCA score in CSF (left) and plasma (right) in the Accelerating Medicines Partnership – Parkinson’s Disease (AMP-PD) cohort. Correlations were performed using Pearson correlation. Aβ, amyloid-β; MoCA, Montreal Cognitive Assessment; MS, mass spectrometry; SMOC1, SPARC-related modular calcium-binding protein 1; T-Tau, total tau. Supplementary Figure 10. HOMER1 Analyses. (A-C) Correlation of HOMER1 relative abundance as measured by SomaScan in CSF (left) and plasma (right) with CSF total tau (T-Tau) levels (A). (B) Correlation of HOMER1 relative abundance in CSF (left) and plasma (right) with CSF phosphorylated tau181 (p-Tau) levels. (C) Correlation of HOMER1 relative abundance in CSF (left) and plasma (right) with Montreal Cognitive Assessment score (MoCA, higher scores reflect better cognitive performance). HOMER1 was measured in CSF and plasma only by SomaScan. Correlations were performed using Pearson correlation. HOMER1, Homer protein homolog 1. Supplementary Figure 11. NEFL Analyses. (A-F) Correlation of NEFL relative abundance in CSF (left) and plasma (right) between proteomic platforms (A). NEFL was not measured in plasma by MS. (B) Differences in NEFL relative abundance between control and AD cases by platform in CSF (left) and plasma (right). (C) Correlation of NEFL relative abundance as measured by Olink in CSF (left) and plasma (right) with CSF T-Tau levels in the AD discovery cohort (top) and AMP-PD cohort (bottom). (D) Correlation of NEFL relative abundance in CSF (left) and plasma (right) with CSF p-Tau levels in the AD discovery cohort (top) and AMP-PD cohort (bottom). (E) Correlation of NEFL relative abundance in CSF (left) and plasma (right) with MoCA score in the AD discovery cohort (top) and AMP-PD cohort (bottom). (F) Correlation of NEFL relative abundance in CSF (left) and plasma (right) with age in the AMP-PD cohort. Correlations were performed using Pearson correlation. Group differences were assessed using t test. Boxplots represent the median, 25^th, and 75^th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). AMP-PD, Accelerating Medicines Partnership – Parkinson’s Disease; MoCA, Montreal Cognitive Assessment (higher scores reflect better cognitive performance); MS, mass spectrometry; NEFL, neurofilament light polypeptide; p-Tau, phosphorylated tau181; T-Tau, total tau. Supplementary Figure 12. CHI3L1 Analyses. (A-E) Correlation of CHI3L1 relative abundance in CSF (left) and plasma (right) between proteomic platforms (A). (B) Differences in CHI3L1 relative abundance between control and AD cases by platform in CSF (left) and plasma (right). (C) Correlation of CHI3L1 relative abundance as measured by MS in CSF (left) and plasma (right) with CSF T-Tau levels. (D) Correlation of CHI3L1 relative abundance in CSF (left) and plasma (right) with CSF p-Tau levels. (E) Correlation of CHI3L1 relative abundance in CSF (left) and plasma (right) with MoCA score. Correlations were performed using Pearson correlation. Group differences were assessed using t test. Boxplots represent the median, 25^th, and 75^th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). CHI3L1, Chitinase-3-like protein 1; MoCA, Montreal Cognitive Assessment (higher scores reflect better cognitive performance); MS, mass spectrometry; p-Tau, phosphorylated tau181; T-Tau, total tau. Supplementary Figure 13. YWHAZ Analyses. (A-E) Correlation of YWHAZ relative abundance in CSF (left) and plasma (right) between SomaScan and MS platforms (A). YWHAZ was not measured by Olink. (B) Differences in YWHAZ relative abundance as measured by MS between control and AD cases in CSF (left) and plasma (right). (C) Correlation of YWHAZ relative abundance as measured by MS in CSF (left) and plasma (right) with CSF T-Tau levels. (D) Correlation of YWHAZ relative abundance in CSF (left) and plasma (right) with CSF p-Tau levels. (E) Correlation of YWHAZ relative abundance in CSF (left) and plasma (right) with MoCA score. Correlations were performed using Pearson correlation. Group differences were assessed using t test. Boxplots represent the median, 25^th, and 75^th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). MoCA, Montreal Cognitive Assessment (higher scores reflect better cognitive performance); MS, mass spectrometry; p-Tau, phosphorylated tau181; T-Tau, total tau; YWHAZ, 14-3-3 protein zeta. Supplementary Figure 14. ENO1 Analyses. (A-E) Correlation of ENO1 relative abundance in CSF (left) and plasma (right) between SomaScan and MS platforms (A). ENO1 was not measured by Olink. (B) Differences in ENO1 relative abundance as measured by MS between control and AD cases in CSF (left) and plasma (right). (C) Correlation of ENO1 relative abundance as measured by SomaScan in CSF (left) and plasma (right) with CSF T-Tau levels. (D) Correlation of ENO1 relative abundance in CSF (left) and plasma (right) with CSF p-Tau levels. (E) Correlation of ENO1 relative abundance in CSF (left) and plasma (right) with MoCA score. Correlations were performed using Pearson correlation. Group differences were assessed using t test. Boxplots represent the median, 25^th, and 75^th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). ENO1, alpha-enolase; MoCA, Montreal Cognitive Assessment (higher scores reflect better cognitive performance); MS, mass spectrometry; p-Tau, phosphorylated tau181; T-Tau, total tau. Supplementary Figure 15. PEBP1 Analyses. (A-F) Correlation of PEBP1 relative abundance in CSF (left) and plasma (right) between proteomic platforms (A). (B) Differences in PEBP1 relative abundance between control and AD cases by platform in CSF (left) and plasma (right). (C) Correlation of PEBP1 relative abundance as measured by SomaScan in CSF (left) and plasma (right) with CSF T-Tau levels in the AD discovery cohort (top) and AMP-PD cohort (bottom). (D) Correlation of PEBP1 relative abundance in CSF (left) and plasma (right) with CSF p-Tau levels in the AD discovery cohort (top) and AMP-PD cohort (bottom). (E) Correlation of PEBP1 relative abundance in CSF (left) and plasma (right) with MoCA score in the AD discovery cohort (top) and AMP-PD cohort (bottom). (F) Correlation of PEBP1 relative abundance in CSF (left) and plasma (right) with age in the AMP-PD cohort. Correlations were performed using Pearson correlation. Group differences were assessed using t test. Boxplots represent the median, 25^th, and 75^th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). AMP-PD, Accelerating Medicines Partnership – Parkinson’s Disease; MoCA, Montreal Cognitive Assessment (higher scores reflect better cognitive performance); MS, mass spectrometry; PEBP1, phosphatidylethanolamine-binding protein 1; p-Tau, phosphorylated tau181; T-Tau, total tau. Supplementary Figure 16. NPTXR Analyses. (A-F) Correlation of NPTXR relative abundance in CSF (left) and plasma (right) between proteomic platforms (A). (B) Differences in NPTXR relative abundance between control and AD cases by platform in CSF (left) and plasma (right). (C) Correlation of relative NPTXR levels measured in each platform between CSF and plasma. (D) Correlation of NPTXR relative abundance as measured by SomaScan in CSF (left) and plasma (right) with CSF T-Tau levels. (E) Correlation of NPTXR relative abundance in CSF (left) and plasma (right) with CSF p-Tau levels. (F) Correlation of PEBP1 relative abundance in CSF (left) and plasma (right) with MoCA score. Correlations were performed using Pearson correlation. Group differences were assessed using t test. Boxplots represent the median, 25^th, and 75^th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). MoCA, Montreal Cognitive Assessment (higher scores reflect better cognitive performance); MS, mass spectrometry; NPTXR, neuronal pentraxin receptor; p-Tau, phosphorylated tau181; T-Tau, total tau. Supplementary Figure 17. SPP1 Analyses. (A-F) Correlation of SPP1 relative abundance in CSF (left) and plasma (right) between proteomic platforms (A). (B) Differences in SPP1 relative abundance between control and AD cases by platform in CSF (left) and plasma (right). (C) Correlation of relative SPP1 levels measured by SomaScan between CSF and plasma. (D) Correlation of SPP1 relative abundance in CSF (left) and plasma (right) with CSF T-Tau levels. (E) Correlation of SPP1 relative abundance in CSF (left) and plasma (right) with CSF p-Tau levels. (F) Correlation of SPP1 relative abundance in CSF (left) and plasma (right) with MoCA score. Correlations were performed using Pearson correlation. Group differences were assessed using t test. Boxplots represent the median, 25^th, and 75^th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). MoCA, Montreal Cognitive Assessment (higher scores reflect better cognitive performance); MS, mass spectrometry; p-Tau, phosphorylated tau181; SPP1, osteopontin; T-Tau, total tau. Supplementary Figure 18. CSF and Plasma Network Platform Representation. Percent coverage of CSF (top) and plasma (bottom) network modules by proteomic platform. Modules are listed in relationship order. “All” indicates percent coverage across the entire network. Supplementary Figure 19. AD Brain Protein Co-Expression Network. AD brain consensus protein correlation network as shown in Fig. 5, including integration with CSF and plasma data. Descriptions of module eigenprotein (EP) correlations with traits and cell type overlap testing are provided in Fig. 5. Brain modules were tested for their presence in the CSF and plasma networks by brain module protein overrepresentation analysis (ORA) and network preservation (preservation) statistics, as previously described [2]. ORA p values are for the module with the strongest overlap. Only modules that reached statistical significance after FDR correction are colored by degree of significance. In addition to module overlap and preservation analyses, the difference in brain module eigenprotein between control and AD, or brain synthetic eigenprotein in CSF or plasma between control and AD, was determined. A significantly increased eigenprotein in AD is indicated in green, whereas a significantly decreased eigenprotein is indicated in blue. Supplementary Figure 20. Module Over-Representation Analysis Across Brain and Biofluid Networks. (A-C) Module member overrepresentation analysis (ORA) of the brain and CSF networks (A), brain and plasma networks (B), and CSF and plasma networks (C). The numbers in each box represent the –log₁₀(FDR) value for the overlap after Benjamini-Hochberg correction. Modules on the y-axis (rows) without an overlap FDR value of – log₁₀ (FDR) > 1 were not included in the heatmaps. Supplementary Figure 21. Plasma Network Module Over-Representation Analysis with a SomaScan Serum Network. Module member overrepresentation analysis (ORA) of a serum protein co-expression network (Emilsson-PM) obtained using the SomaScan platform, as described in Emilsson et al. [32], with the plasma 3-platform (plasma-3pl) network. Ontologies for each plasma network module are provided in Fig. 9. The numbers in each box represent the –log₁₀(FDR) value for the overlap after Benjamini-Hochberg correction. Modules on the y-axis (rows) without an overlap FDR value of – log₁₀ (FDR) > 1 were not included in the heatmaps. Extended Data. Extended Data 1. Correlation of Proteins Commonly Measured in CSF by MS-TMT and Olink. n=36 unless otherwise indicated. Measurements were from TMT-MS on CSF depleted of highly abundant plasma proteins. Olink NPX values are shown on the x-axis, and TMT-MS log2 relative values are shown on the y-axis. Measurements that were below LOD in Olink are not outlined. Correlations include those proteins that were matched by gene symbol only. Correlations were performed using Pearson test. Extended Data 2. Correlation of Proteins Commonly Measured in CSF by MS-TMT and SomaScan. n=35 unless otherwise indicated. Measurements were from TMT-MS on CSF depleted of highly abundant plasma proteins. TMT-MS log₂ relative values are shown on the x-axis, and SomaScan log₂(RFU) values are shown on the y-axis. Measurements that were below LOD in SomaScan are not outlined. Correlations were performed for all SOMAmers including multiple SOMAmers targeting the same protein, and include those proteins that were matched by gene symbol only. Correlations were performed using Pearson test. Extended Data 3. Correlation of Proteins Commonly Measured in CSF by Olink and SomaScan. n=35 unless otherwise indicated. Olink NPX values are shown on the x-axis, and SomaScan log2(RFU) values are shown on the y-axis. Measurements that were below LOD in either platform are not outlined. Correlations were performed for all SOMAmers including multiple SOMAmers targeting the same protein, and include those proteins that were matched by gene symbol only. Correlations were performed using Pearson test. Extended Data 4. GO Analysis on AD CSF Network Modules. Gene ontology (GO) analysis was performed to gain insight into the biological meaning of each AD CSF protein network module. Enrichment for a given ontology is shown by z score. Extended Data 5. AD CSF Network Module Protein Graphs. The size of each circle indicates the relative eigenprotein correlation value (kME) in each network module. Those proteins with the largest kME are considered “hub” proteins within the module, and explain the largest variance in module expression. Proteins outlined in gold are from the SomaScan platform. Proteins outlined in green are from the Olink platform. Proteins outlined in purple are from the TMT-MS platform. Only the top 100 proteins by kME for each module are shown. Extended Data 6. AD CSF Network Module Eigenprotein Levels and Correlations. n=18 control, 17 AD. Differences between case groups were assessed by t test or one-way ANOVA. Correlations were performed by bicor or Pearson test (cor). Significance at p<0.05 is highlighted in red. Boxplots represent the median, 25th, and 75th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). Extended Data 7. AD CSF Network Module Synthetic Eigenprotein Levels and Correlations in Brain. n=18 control, 17 AD. Differences between case groups were assessed by t test or one-way ANOVA. Correlations were performed by bicor or Pearson test (cor). Significance at p<0.05 is highlighted in red. Boxplots represent the median, 25th, and 75th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). Extended Data 8. AD CSF Network Module Synthetic Eigenprotein Levels and Correlations in Plasma. n=18 control, 17 AD. Differences between case groups were assessed by t test or one-way ANOVA. Correlations were performed by bicor or Pearson test (cor). Significance at p<0.05 is highlighted in red. Boxplots represent the median, 25th, and 75th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). Extended Data 9. Within-Subject CSF Network Module Eigenprotein Levels in Plasma and CSF. n=18 control, 17 AD. Relative CSF protein network module eigenprotein levels and their synthetic eigenprotein levels in plasma were compared within subject across fluids in control and AD cases. The difference in average slope (Z_slope) between AD and control was calculated for each module. Extended Data 10. Correlation of Proteins Commonly Measured in Plasma by MS-TMT and Olink. n=36 unless otherwise indicated. Measurements were from TMT-MS on plasma depleted of highly abundant plasma proteins. Olink NPX values are shown on the x-axis, and TMT-MS log₂ relative values are shown on the y-axis. Measurements that were below LOD in Olink are not outlined. Correlations include those proteins that were matched by gene symbol only. Correlations were performed using Pearson test. Extended Data 11. Correlation of Proteins Commonly Measured in Plasma by MS-TMT and SomaScan. n=35 unless otherwise indicated. Measurements were from TMT-MS on plasma depleted of highly abundant plasma proteins. TMT-MS log₂ relative values are shown on the x-axis, and SomaScan log₂(RFU) values are shown on the y-axis. Measurements that were below LOD in SomaScan are not outlined. Correlations were performed for all SOMAmers including multiple SOMAmers targeting the same protein, and include those proteins that were matched by gene symbol only. Correlations were performed using Pearson test. Extended Data 12. Correlation of Proteins Commonly Measured in Plasma by Olink and SomaScan. n=35 unless otherwise indicated. Olink NPX values are shown on the x-axis, and SomaScan log₂(RFU) values are shown on the y-axis. Measurements that were below LOD in either platform are not outlined. Correlations were performed for all SOMAmers including multiple SOMAmers targeting the same protein, and include those proteins that were matched by gene symbol only. Correlations were performed using Pearson test. Extended Data 13. GO Analysis on AD Plasma Network Modules. Gene ontology (GO) analysis was performed to gain insight into the biological meaning of each AD plasma protein network module. Enrichment for a given ontology is shown by z score. Extended Data 14. AD Plasma Network Module Protein Graphs. The size of each circle indicates the relative eigenprotein correlation value (kME) in each network module. Those proteins with the largest kME are considered “hub” proteins within the module, and explain the largest variance in module expression. Proteins outlined in gold are from the SomaScan platform. Proteins outlined in green are from the Olink platform. Proteins outlined in purple are from the TMT-MS platform. Only the top 100 proteins by kME for each module are shown. Extended Data 15. AD Plasma Network Module Eigenprotein Levels and Correlations. n=18 control, 17 AD. Differences between case groups were assessed by t test or one-way ANOVA. Correlations were performed by bicor or Pearson test (cor). Significance at p<0.05 is highlighted in red. Boxplots represent the median, 25th, and 75th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). Extended Data 16. AD Plasma Network Module Synthetic Eigenprotein Levels and Correlations in Brain. n=18 control, 17 AD. Differences between case groups were assessed by t test or one-way ANOVA. Correlations were performed by bicor or Pearson test (cor). Significance at p<0.05 is highlighted in red. Boxplots represent the median, 25th, and 75th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). Extended Data 17. AD Plasma Network Module Synthetic Eigenprotein Levels and Correlations in CSF. n=18 control, 17 AD. Differences between case groups were assessed by t test or one-way ANOVA. Correlations were performed by bicor or Pearson test (cor). Significance at p<0.05 is highlighted in red. Boxplots represent the median, 25th, and 75th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). Extended Data 18. Within-Subject Plasma Network Module Eigenprotein Levels in CSF and Plasma. n=18 control, 17 AD. Relative plasma protein network module eigenprotein levels and their synthetic eigenprotein levels in CSF were compared within subject across fluids in control and AD cases. The difference in average slope (Z_slope) between AD and control was calculated for each module. Extended Data 19. Brain TMT-MS AD Network Module Synthetic Eigenprotein Levels and Correlations in CSF. n=18 control, 17 AD. Differences between case groups were assessed by t test or one-way ANOVA. Correlations were performed by bicor or Pearson test (cor). Significance at p<0.05 is highlighted in red. Boxplots represent the median, 25th, and 75th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars). Extended Data 20. Brain TMT-MS AD Network Module Synthetic Eigenprotein Levels and Correlations in Plasma. n=18 control, 17 AD. Differences between case groups were assessed by t test or one-way ANOVA. Correlations were performed by bicor or Pearson test (cor). Significance at p<0.05 is highlighted in red. Boxplots represent the median, 25th, and 75th percentiles, and box hinges represent the interquartile range of the two middle quartiles within a group. Datapoints up to 1.5 times the interquartile range from box hinge define the extent of whiskers (error bars).

Additional file 2: Supplementary Table 1.

Traits for Emory Goizueta Alzheimer's Disease Research Center Cohort. Supplementary Table 2. SomaScan Filter Counts. Supplementary Table 3. Proteins Affected by Depletion of Highly Abundant Proteins in CSF and Plasma. Supplementary Table 4. Platform Overlap with SomaScan, Olink, and MS Depleted CSF. Supplementary Table 5. Platform Overlap with SomaScan, Olink, and MS Undepleted CSF. Supplementary Table 6. Platform Overlap with SomaScan, Olink, and MS Depleted Plasma. Supplementary Table 7. Platform Overlap with SomaScan, Olink, and MS Undepleted Plasma. Supplementary Table 8. Overlap Between CSF and Plasma in Olink. Supplementary Table 9. Overlap Between CSF and Plasma in SomaScan. Supplementary Table 10. Overlap Between CSF and Plasma in MS Depleted. Supplementary Table 11. Overlap Between CSF and Plasma in MS Undepleted. Supplementary Table 12. Correlation Between Olink and SomaScan Measurements in CSF. Supplementary Table 13. Correlation Between Olink and SomaScan Measurements in Plasma. Supplementary Table 14. Correlation Between Olink and TMT-MS Measurements in CSF. Supplementary Table 15. Correlation Between Olink and TMT-MS Measurements in Plasma. Supplementary Table 16. Correlation Between TMT-MS and SomaScan Measurements in CSF. Supplementary Table 17. Correlation Between TMT-MS and SomaScan Measurements in Plasma. Supplementary Table 18. CSF AD Network Module Memberships. Supplementary Table 19. CSF Network Circular Heatmap Values. Supplementary Table 20. Plasma AD Network Module Memberships. Supplementary Table 21. Plasma Network Circular Heatmap Values. Supplementary Table 22. Brain Network Circular Heatmap Values.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Dammer, E.B., Ping, L., Duong, D.M. et al. Multi-platform proteomic analysis of Alzheimer’s disease cerebrospinal fluid and plasma reveals network biomarkers associated with proteostasis and the matrisome. Alz Res Therapy 14, 174 (2022). https://doi.org/10.1186/s13195-022-01113-5

Download citation

Received: 31 May 2022
Accepted: 31 October 2022
Published: 17 November 2022
DOI: https://doi.org/10.1186/s13195-022-01113-5

Multi-platform proteomic analysis of Alzheimer’s disease cerebrospinal fluid and plasma reveals network biomarkers associated with proteostasis and the matrisome

Abstract

Introduction

Results

Pre-processing and technical analyses of proteomic measurements in CSF and plasma

Cross-platform comparisons

Proteins of lower abundance are decreased in AD plasma

Brain protein network module coverage by platform in CSF and plasma

AD CSF co-expression network reveals strong disease-related modules reflecting proteostasis, synaptic, complement, and sugar metabolism pathophysiology

AD plasma co-expression network reveals strong disease-related modules reflecting endocytosis and matrisome pathophysiology

Discussion

Methods

CSF and plasma samples and case classification

Quantification of proteins by Olink proximity extension assay (PEA)

Quantification of proteins by SomaLogic SomaScan modified aptamers

CSF protein preparation and digestion for tandem mass tag mass spectrometry (TMT-MS) analysis

CSF undepleted of highly abundant plasma proteins

CSF depleted of highly abundant plasma proteins

Plasma protein preparation and digestion for TMT-MS analysis

Plasma undepleted of highly abundant plasma proteins

Plasma depleted of highly abundant plasma proteins

Isobaric TMT peptide labeling

High-pH off-line fractionation

CSF and plasma undepleted of highly abundant plasma proteins

CSF and plasma depleted of highly abundant plasma proteins

TMT mass spectrometry

CSF undepleted of highly abundant plasma proteins

Plasma undepleted of highly abundant plasma proteins

CSF and plasma depleted of highly abundant plasma proteins

Database searches and protein quantification

Protein abundance data processing

Tandem mass tag mass spectrometry (TMT-MS)

Olink proximity extension assay (PEA) and SomaLogic SomaScan assay

Proteome coverage overlap, ontology enrichment, and missing data analysis

Censoring of proteins affected by depletion of highly abundant proteins

Protein abundance correlation analysis

Cumulative signal and total protein abundance comparison

Differential expression analysis

Comparison to external datasets

Harmonization of platform protein abundance prior to network analysis

Protein co-expression network analysis

Network module overlap

Network preservation

Cell type marker enrichment analyses

Other statistics

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1: Supplementary Figure 1.

Additional file 2: Supplementary Table 1.

Rights and permissions

About this article

Cite this article

Share this article

Alzheimer's Research & Therapy

Contact us