Analyzing the genes related to Alzheimer’s disease via a network and pathway-based approach

Background Our understanding of the molecular mechanisms underlying Alzheimer’s disease (AD) remains incomplete. Previous studies have revealed that genetic factors provide a significant contribution to the pathogenesis and development of AD. In the past years, numerous genes implicated in this disease have been identified via genetic association studies on candidate genes or at the genome-wide level. However, in many cases, the roles of these genes and their interactions in AD are still unclear. A comprehensive and systematic analysis focusing on the biological function and interactions of these genes in the context of AD will therefore provide valuable insights to understand the molecular features of the disease. Method In this study, we collected genes potentially associated with AD by screening publications on genetic association studies deposited in PubMed. The major biological themes linked with these genes were then revealed by function and biochemical pathway enrichment analysis, and the relation between the pathways was explored by pathway crosstalk analysis. Furthermore, the network features of these AD-related genes were analyzed in the context of human interactome and an AD-specific network was inferred using the Steiner minimal tree algorithm. Results We compiled 430 human genes reported to be associated with AD from 823 publications. Biological theme analysis indicated that the biological processes and biochemical pathways related to neurodevelopment, metabolism, cell growth and/or survival, and immunology were enriched in these genes. Pathway crosstalk analysis then revealed that the significantly enriched pathways could be grouped into three interlinked modules—neuronal and metabolic module, cell growth/survival and neuroendocrine pathway module, and immune response-related module—indicating an AD-specific immune-endocrine-neuronal regulatory network. Furthermore, an AD-specific protein network was inferred and novel genes potentially associated with AD were identified. Conclusion By means of network and pathway-based methodology, we explored the pathogenetic mechanism underlying AD at a systems biology level. Results from our work could provide valuable clues for understanding the molecular mechanism underlying AD. In addition, the framework proposed in this study could be used to investigate the pathological molecular network and genes relevant to other complex diseases or phenotypes. Electronic supplementary material The online version of this article (doi:10.1186/s13195-017-0252-z) contains supplementary material, which is available to authorized users.


Background
Alzheimer's disease (AD) is the most prevalent neurodegenerative disorder and accounts for the majority of people diagnosed with dementia [1]. As a complex and chronic neurological disease, AD affects about 6% of people aged 65 years and older [2], and is responsible for about 480,000 deaths per year around the world [3]. In addition to its affect on the life quality of those suffering from the disorder and their families, AD also causes a severe burden on society. In the USA alone, the health-care costs related to AD are about $172 billion per year [4].
AD can be diagnosed by symptoms such as short-term memory loss, mood swings, learning impairments, and disruptions in daily activities [5]. However, as an agerelated and progressive disease, some pathological features of AD (e.g., amyloid deposition, accumulation of neurofibrillary tangles, as well as function and structure changes of brain regions involved in memory) often appear many years prior to clinical manifestations [6,7]. These pathological changes eventually lead to the damage and death of specific neurons, resulting in the emergence of clinical symptoms.
The cause of AD is still poorly understood although much effort has been dedicated to exploring the pathological and molecular mechanisms of AD via various approaches-e.g., animal models, gene expression profiling, genome-wide association studies (GWAS), neuroimaging techniques, or a systems biology framework [2,[8][9][10][11]. It is agreed that AD develops as a result of the combination of multiple factors, including genetic factors, a history of head injuries, depression, or hypertension. Among these factors, it is estimated that about 70% of the risk for AD is attributable to genetics [1,12]. Established genetic causes of AD include the dominant mutations of genes encoding amyloid precursor protein (APP), presenilin 1 (PSEN1), and presenilin 1 (PSEN2). However, these genes are only responsible for the pathogenesis of AD in about 5% of patients with clinical symptoms appearing in midlife. On the other hand, genetic analyses have suggested that, in complex disorders like AD, individual differences can be caused by many genes and their variants. Genes with various biological functions may act in coordination to increase the risk of AD, with a moderate or small effect exerted by each gene [1]. Consistent with this view, more and more genes-e.g., apolipoprotein E (APOE), glycogen synthase kinase 3 beta (GSK3B), dual specificity tyrosinephosphorylation-regulated kinase 1A (DYRK1A), and Tau-have been found to be potentially associated with AD [1,13]. For these genes, although a few plausible candidate genes have been partially replicated, some are considered problematic. This is especially true as highthroughput methods like GWAS are being more widely applied to genetic studies of AD. Under such circumstances, a comprehensive analysis of potential causal genes of AD within a pathway and/or a network framework may not only provide us with important insights beyond the conventional single-gene analyses, but also offer consolidated validation for the individual candidate gene.
In the current study, we implemented a comprehensive curation of AD-related genes from genetic association studies. We then conducted biological enrichment analyses to detect the significant functional themes within these genetic factors and analyzed the interactions among the enriched biochemical pathways by pathway crosstalk analysis. Furthermore, an AD-specific protein network was inferred and evaluated with the human protein-protein interaction network as the background. This study should offer valuable hints for understanding the molecular mechanisms of AD from a perspective of systems biology.

Identification of AD-related genes
The genes genetically associated with AD were collected by retrieving the human genetic association studies deposited in PubMed (http://www.ncbi.nlm.nih.gov/pubmed/). We retrieved publications associated with AD with the searching term ' ( '. By July 7, 2015, a total of 5298 reports were retrieved. After reviewing all abstracts of these publications, only the genetic association studies on AD were selected. From the obtained publication pool, we then concentrated on those studies reporting a significant association of gene(s) with AD. In order to reduce the number of potential false-positive genes, the studies reporting insignificant or negative associations were excluded even though some genes in these studies might actually be truly associated with AD. We then reviewed the full reports of each selected publication to make sure that the conclusion was consistent with its contents. In several studies, some genes were found to function cooperatively to exert significant influences on AD, with each gene having a small or mild impact; these genes were also included in our list. In addition, the genes from several GWAS analyses on AD, showing genetic association at a genome-wide significance level, were also included.

Functional enrichment analysis of genes related to AD
WebGestalt [14] and ToppGene [15] were utilized to detect the biological themes of the AD-related genes. As a web-based bioinformation-mining platform, WebGestalt integrates information from multiple resources to determine the biological themes, including identifying the overrepresented Gene Ontology (GO) terms, amid the candidate gene listing. In this study, only the GO biological process terms with false discovery rate (FDR) value smaller than 0.05 were kept as the significantly enriched ones. ToppGene was used to identify and analyze the enriched biological pathways in the input genes. Pathways with FDR < 0.05 were considered to be significantly enriched.

Analysis of crosstalks among pathways
We further built crosstalks among pathways to investigate interlinks and interactions of the enriched pathways. To measure the overlap between two pathways, the overlap coefficient (OC) and the Jaccard coefficient (JC) were calculated using the corresponding formulas: in which A and B are the lists of genes of the two examined pathways. Briefly, the following procedure was adopted to construct the pathway crosstalks: (1)Only pathways with FDR < 0.05 were kept for crosstalk analysis. Meanwhile, pathways with five or fewer candidate genes were discarded because pathways with too few candidate genes might present few or biased connections with other pathways. (2)Counting the common candidate genes of each pathway pair-those pathway pairs with less than two overlapped genes were removed. (3)Measuring the overlap in every pathway pair by the corresponding JC and OC values. (4)Constructing the pathway crosstalk with Cytoscape software [16].

Compilation of the human protein-protein interaction network
To explore the correlation and interaction among the ADrelated genes, we compiled a comprehensive protein-protein interaction (PPI) network, based on which the protein network topological properties of the gene set related to AD were calculated and analyzed. Briefly, the human protein-protein interaction data were obtained from the Protein Interaction Network Analysis (PINA) database (latest release version: May 21, 2014) [17] by pooling and curating the unique physical interaction information from six main public protein interaction databases: BioGRID, IntAct, DIP, MINT, MIPS/MPact, and HPRD. In the meantime, another interactome for Homo sapiens [18] that contained 141,296 edges (physical protein interactions) among 13,460 nodes (proteins), consisting of metabolic pathway-related interactions, regulatory and protein-protein interactions, and interaction pairs for kinase and specific substrate, was selected as an additional source of interactome data. After merging the two interactome data by excluding the selfinteracting and redundant pairs, the proteins in the list were mapped onto Entrez protein-coding genes for Homo sapiens via the Uniprot ID mapping tool (http://www.uniprot.org/uploadlists). Finally, we compiled a relatively comprehensive human physical interactome, which comprised 16,022 genes/proteins and 228,122 interactions (see Additional file 1).

Construction of the AD-specific protein subnetwork
A subnetwork specific to a given disease can provide us with hints for how the disease-related molecules interact with each other. A network parsimony principle has been demonstrated in the context of biological processes [19]; that is, the molecular networks/pathways often follow the shortest molecular paths between known disease-associated components (disease-related genes or proteins in our case). The Steiner minimal tree algorithm coincides with this biological principle, which uses a greedy heuristic strategy to iteratively link the smaller trees to larger ones until there is only one tree connecting all seed nodes [20]. GenRev [21] was utilized to identify the pathological subnetwork from the human interactome using the curated AD-related genes as input. To assess the non-randomness of the constructed network, 1000 random networks with the same number of vertices and interactions as the AD-specific network were generated using the Erdos-Renyi model in R igraph package [22].

Compilation of genes associated with AD
Genes associated with AD were compiled through searching the published genetic association studies on AD in PubMed.

Biological function enrichment analysis of Alzgset
Functional enrichment analysis revealed a more detailed biological function spectrum of these AD-related genes (see Additional file 2: Table S2). Among the GO terms overrepresented in Alzgset, those related to lipid and/or lipoprotein-related processes, drug reactions, neural development, or synaptic transmission were included. GO terms associated with drug reactions (e.g., response to ethanol, response to nicotine, and response to cocaine) and metabolic processes (e.g., xenobiotic metabolic process) were overrepresented. These results were in line with previous findings that complicated correlations existed between the pathophysiological state of AD and drug abuse [23,24]. Of significance, top-ranked terms included some lipid/lipoprotein-related processes, including phospholipid efflux, reverse cholesterol transport, cholesterol homeostasis, and lipoprotein metabolic processes. Biological process terms related to synaptic transmission (e.g., positive regulation of transmission of nerve impulse; synaptic transmission, cholinergic; regulation of synaptic transmission, dopaminergic; and regulation of neurotransmitter secretion), dopamine metabolism (dopamine metabolic process), and other neural functions (e.g., synaptic vesicle transport, regulation of neuronal synaptic plasticity, neuron migration, and memory) were also enriched. Meanwhile, GO terms related to immunological function (e.g., T-helper 1 type immune response, positive regulation of interleukin-6 production, and chronic inflammatory response) were overrepresented. The diversity in the function of ADrelated genes demonstrated the complexity of the disease.

Biochemical pathway enriched in Alzgset
Detecting the biological pathways overrepresented among Alzgset may provide useful information about the pathogenic molecular mechanism underlying AD. For Alzgset, 68 enriched pathways were identified (Table 1). Among them, several pathways related to immune processes were included (e.g., cytokines and inflammatory response, cytokine network, dendritic cells in regulating TH1 and TH2 development, and IL-5 signaling), consistent with previous studies [25,26]. Also, neurotransmitter signaling-related pathways were identified, such as cholinergic synapse, dopaminergic synapse, serotonergic synapse, and so forth. Additionally, in the Alzgset enriched pathway list, there were some pathways related to cell growth and/or survival, including neurotrophin signaling, PI3K-Akt signaling, mTOR signaling, Notch signaling, and so forth, which are vital for cell growth/survival state of neurons in the process of AD [27,28]. Moreover, metabolism-related pathways, consisting of drug metabolism (cytochrome P450), glutathione metabolism, and metabolism of xenobiotics by cytochrome P450, were also significantly enriched, indicating that related metabolism processes were involved in the etiology and development processes of AD. What is more, the pathway of the intestinal immune network for IgA production was enriched, which might suggest a connection between AD and the intestinal microbiota [29,30]. Furthermore, pathways involved in osteoclast differentiation and adipocytokine signaling were also detected, complying with prior studies [31][32][33].

Crosstalks among significantly enriched pathways
To explore the correlations between the pathways, we implemented a pathway crosstalk analysis for the 68 enriched pathways. Here we assumed that crosstalk existed in a pathway pair if they had a proportion of common genes in Alzgset [34]. There were 41 pathways including six or more members in Alzgset, of which 37 pathways met the criterion for crosstalk analysis; that is, each pathway shared at least two genes with one or more other pathways. All of the pathway pairs (207 crosstalks among 37 pathways) were used for constructing the pathway crosstalk network and the overlap significance of each pathway pair was evaluated based on the average of JC and OC.
Based on their crosstalks, these pathways could be roughly divided into three major modules, with pathways in each group having more crosstalks with each other than with those outside of this module and more likely being related to the same or similar biological process (Fig. 1). The first module primarily included neuronal-related and xenobiotic or drug metabolismrelated pathways (e.g., calcium signaling, dopaminergic synapse, cholinergic synapse, serotonergic synapse and neurotrophin signaling, metabolism of xenobiotics by cytochrome P450, and drug metabolism-cytochrome P450). The major theme of the second module was cell growth/survival and neuroendocrine-related pathways (e.g., PI3K-Akt signaling, mTOR signaling, notch signaling, prolactin signaling, etc.). The third module included immune response-related pathways (e.g., toll-like receptor signaling, Fc epsilon RI signaling pathway). At the same time, the three modules were interlinked with each other, indicating the existence of an AD-specific immune-endocrine-neuronal regulatory network.     CCL2, CCL3, CCR2, CXCL8, FAS, IL10, IL12A, IL12B, IL18,  IL1A, IL1B, IL23R, IL4, IL6, IL6R, NGFR, TGFB1

AD-specific protein network
To further examine the potential pathological protein network of Alzgset, we constructed a subnetwork for AD from the human protein-protein interaction network via the Steiner minimal tree algorithm. This method tries to connect the largest number of input nodes (genes included in Alzgset in our case) via the least number of interlinking nodes. As shown in Fig. 2, the protein network of AD comprised 496 nodes and 1521 edges (interactions). As shown, 393 out of 430 Alzgset genes were included in the AD-specific network, which accounted for 79.2% of 496 genes in the network and 91.4% of Alzgset, demonstrating a high coverage of Alzgset in the subnetwork. There were 103 genes in the AD-specific molecular network outside of Alzgset ( Table 2). Given that these intermediate genes interacted closely with those known to be related to AD, they might also be involved in the pathological process of the disease phenotype. Notably, a number of the genes-e.g., epidermal growth factor receptor (EGFR), nuclear respiratory factor 1 (NRF1), somatostatin receptor 2 (SSTR2), and sortilin 1 (SORT1)-had been shown related to AD in several previous studies [35][36][37][38]. Some of these genes have not been reported to be directly involved in the pathophysiological condition of AD, but genes linking to them or other members of the same protein family may have been found to play a role in such processes. For instance, ATP binding cassette subfamily G member 5 (ABCG5), a member of a transport system superfamily, involved in ATP binding and transporting of substrates across cytomembranes, was a node in the AD-specific network  Fig. 1 Crosstalk network amid Alzgset-overrepresented pathways. Vertices, biological pathways; lines, crosstalks among pathways. Width of one line (edge) shows direct proportion with the crosstalk level of a given pathway pair. Nodes tagged with numbers represent the following corresponding pathways: 1, intestinal immune network for IgA production; 2, toll-like receptor signaling pathway; 3, cytokine-cytokine receptor interaction; 4, hematopoietic cell lineage; 5, TNF signaling pathway; 6, apoptosis; 7, Fcε RI signaling pathway but was out of Alzgset. However, six members from the same family were included in Alzgset (ABCA1, ABCA2, ABCA7, ABCC2, ABCG1, and ABCG2), and there was experimental evidence for their involvement in AD; for example, the expression reduction or loss of function of ABCA7 could alter Alzheimer amyloid processing [39]. Solute carrier family 40 member 1 (SLC40A1), encoding a cytomembrane protein that may be linked to iron export from duodenal epithelial cells, was also included in the AD-specific network.
SLC40A1can interact with Golgi membrane protein 1 (GOLM1) and hepcidin antimicrobial peptide (HAMP). The former was a gene in Alzgset and its mutation may be related to reduced regional gray matter volume in AD patients [40], and the expression of HAMP was significantly reduced in hippocampal lysates from AD brains [41]. Thus, it is likely that some of the 103 genes in the AD-specific network may play roles in AD susceptibility and can be novel targets for further exploration.

Discussion
We have made great progress in exploring the molecular mechanisms of Alzheimer's disease in recent years. With the advancement and maturity of high-throughput technology, we are able to identify the elements related to this disease on much larger scales. Although more and more genes/proteins potentially involved in the disease have been reported, a thorough analysis of the biochemical processes associated with the pathogenesis of AD from the molecular aspect is still missing. In such cases, a systematic analysis of ADrelated genes via a pathway-based and network-based analytical framework will provide us with insight into the disease beyond the single candidate gene-based analyses [42][43][44]. In this study, by pooling and curating human genes related to AD from genetic studies, and systematically delineating the interconnection of these genes by means of pathway-based and networkbased analyses, we analyzed AD-related biochemical processes and their interactions.   Compared with the candidate gene(s)-based approach, a comprehensive analysis on AD-related genes conducted in this study has its own advantages. By implementing an extensive compilation and curation of human genes from genetic association studies on AD, we could obtain valuable gene source data for further analysis. Especially, because the risk of AD susceptibility can be attributed to many genes, with multiple genes functioning in a concerted manner and each gene exerting a small effect [45], we took this into consideration by also retrieving genes jointly showing significant genetic association with AD. At the same time, by focusing on the biological correlation of genes, pathway and network analysis can not only give us a more comprehensive view for the pathological mechanisms of AD, but are also more robust to the influence of false-positive genes.
As revealed by function enrichment analysis, genes in Alzgset may play important roles in lipid/lipoprotein-related procedures, the immune system, the metabolic process, drug response processes, and neurodevelopment. For example, terms such as reverse cholesterol transport, positive regulation of interleukin-6 production, response to ethanol, lipoprotein metabolic process, diol metabolic process, xenobiotic metabolic process, and regulation of neuronal synaptic plasticity were overrepresented among Alzgset genes, implying the important roles of these processes in the pathological processes of AD. Furthermore, we noticed several terms of memory, visual learning, social behavior, sleep, axon regeneration, and axon guidance also emerged in the enriched list, concurrent with a-priori biological findings for AD [46][47][48][49][50].
Our biochemical pathway analysis showed that immune-related pathways were enriched among Alzgset, which further highlighted the connections between AD and immune-related biological activities. Previous studies have shown the involvement of neuroinflammation in AD pathology, with inflammatory cytokines exerting central efforts [51,52]. Simultaneously, four pathways associated with neurotransmitters were found to be overrepresented in Alzgset, coinciding with their essential roles in the etiology and progression of AD. Acetylcholine, dopamine, and serotonin are major neurotransmitters, involved in advanced neuronal functions (e.g., learning, memory, language, etc.), exerting key effects in the pathologic processes of AD. These neurotransmitters could be involved in the damaging procedure of synaptic plasticity like long-term potentiation and long-term depression in AD subjects or animal models, which in turn may impair some synapse-based higher brain functions such as memory and cognition [53][54][55]. Moreover, our results detected several pathways pertaining to neuroendocrine activities (i.e., ovarian steroidogenesis and prolactin signaling), cuing endocrine processes for the pathogenesis of AD [56,57]. In addition, the adipocytokine signaling pathway was enriched in Alzgset. Adipocytokines, including leptin, adiponectin, NAMPT, RBP-4, and other proinflammatory cytokines, have attracted much attention due to their close connection with AD [32,57,58]. Detection of the adipocytokine signaling pathway in this study provides further evidence for the relationship between adipocytokine and the development and progression of AD, and may also support the idea that AD could be a metabolic disease [59][60][61]. As suggested by the results shown, the molecular mechanisms underlying AD are pretty complicated, calling for further thorough studies to decode the underlying pathologic mechanisms.
Of significance, we detected three major pathway groups through pathway crosstalk analysis. One group basically involved the pathways related to the nervous system and metabolism-related activities. Amid these pathways, cholinergic synapse, the calcium signaling pathway, dopaminergic synapse, serotonergic synapse, and neurotrophin signaling have been well dissected to Alzheimer's disease-related genes gene set function in the progress of AD [62][63][64][65]. In the second module, pathways were largely dominated by immune response or related functions, and by cell growth/survival and neuroendocrine pathways for the third group. Furthermore, we could notice that these three pathway modules were interconnected and acted as an immuneendocrine-neuronal regulatory network for the AD-related pathological conditions. Of note, one pathway (i.e., intestinal immune network for IgA production) was found to be a component part of the immune module.
These results might suggest that the gut-brain axis, made up of immune, neuroendocrine, and neuronal components, was involved in the pathogenesis of AD [66][67][68], in line with results from pathway crosstalk analysis (i.e., there being three similar modules containing Alzgset-enriched pathways). Subsequently, via in-depth examination, we observed that the immune module has plenty of pathway crosstalks and plenty of crosstalk strength. In turn, the cell growth/survival and neuroendocrine module has lower number and less strength, compared with the immune module; however, in terms of the neural module, the number and strength of crosstalks are greater and larger. In spite of the limited number of crosstalks, there exist paramount crosstalk levels among metabolic pathways. These observed results might provide causal and regulatory hints for AD. Integrating results from biochemical pathway and pathway crosstalk analyses and the a-priori biological knowledge base, the major pathways related to AD could be summarized in a diagram (Fig. 3).
Further, we extracted an AD-specific protein network on the basis of the human protein-protein interaction network. It is worth noting that some linking genes outside Alzgset but included in the human protein-protein interaction network may be potentially related to AD. For example, nuclear respiratory factor-1 (NRF1) could be affected by early changes in genes participating in the insulin and energy metabolism pathways in an APP/PS1 transgenic mouse model of AD [69]. TYROBP, a transmembrane signaling protein, appeared in our AD-specific subnetwork. By constructing gene regulatory networks in 1647 postmortem brain tissues from late-onset Alzheimer's disease (LOAD) patients and normal subjects, an immune and microglia-related module dominated by genes participating in pathogen phagocytosis was identified, with TYROBP as a key causal regulator upregulated in LOAD [70]. CDH2, a classical cadherin playing roles in the development of the nervous system, was found with the pathogenic copy number variations from 261 early-onset familial Alzheimer's disease and early/mixed-onset pedigree individuals using high-density DNA microarrays [71]. By applying cell-based studies and FBXO2 knockout mice, it was found that FBXO2 could regulate amyloid precursor protein-related activities in the brain and might modulate AD pathogenesis, coupling with our result to consolidate its involvement in AD [72]. Although no evidence indicated that VSTM2L, one of the intermediate genes, was directly related to AD, it Fig. 3 Main biochemical pathways related to AD. Numbers of genetics-based studies have revealed the fact that AD is actually a complex disorder. These major biochemical pathways involved in AD were connected based on their biological relations interacted with ataxin 1 (ATXN1) of Alzgset [73], whose biological function is presently unknown, and also might be a secreted antagonist of Humanin (HN) [74] which mediated attenuation of AD-related memory impairment and Aβ-induced AD-like pathological changes [75,76]. As specified by the results detailed, this protein subnetwork predicting approach could not only engender a significant predicted subnetwork of Alzgset for AD, but could also possess the potentiality to detect promising relevant genes.
There have been several available datasets or projects focused on the curation of AD-related genes, including Alz-Gene [77], Alzheimer's Disease Neuroimaging Initiative (ADNI) [78], the Alzheimer Disease & Frontotemporal Dementia Mutation Database (AD&FTDMDB) [79], and AlzBase [80]. While AlzGene maintains a comprehensive catalog of genetic association studies on AD and also includes results from meta-analysis of polymorphisms with genotype data available in several GWAS projects on AD, AD&FTDMDB is dedicated to the known mutations of genes associated with AD and frontotemporal dementias from the published reports or presentations at scientific meetings. The ADNI project aims at facilitating the investigation of genetic influences on AD onset and progression reflected in imaging changes, fluid biomarkers, and cognitive status. It has reported several neuroimaging GWAS with imaging quotas as quantitative phenotypes, such as hippocampal volume and hippocampal gray matter density. On the other hand, AlzBase is an integrative database for genes dysregulated in AD and related diseases, and comprises annotations and expression information on more than 7800 differentially expressed genes collected from multiple microarray datasets. These datasets with different features provide valuable information on genes and/or phenotypes for exploring and understanding AD and its mechanisms.
Similar to AlzGene, Alzgset is also a compilation of ADrelated genes identified in genetic association studies. While AlzGene includes both genes showing positive and negative association with AD, Alzgset focuses only on the genes reported to be positively associated with AD by the original authors. Because AlzGene has not been updated since April 2011, results from many recent genetic association studies may not be included. In association with studies on candidate genes, some genes may each possess a mild to moderate p value, but two or more genes could collectively show a more significant association with AD due to the fact they probably act in a concerted manner. In such cases, all of these candidates were included in Alzgset as long as the original authors could provide sufficient evidence. On the other hand, the genes in AlzGene were selected from meta-analyses for each polymorphism and a relative uniform criterion was adopted, so the genes mentioned may be neglected. Thus, Alzgset should offer an informative supplement for AlzGene and serve as a useful dataset for AD investigation.
However, there were several limitations in this study. First, our pathway-based and network-based analyses results relied on genes in the publications reported to be associated with AD. In view of the fact that identification of risk genes for AD is still an ongoing task, the GO biological process terms, biochemical pathways, and results derived from network analysis should also be treated in the similar manner. Second, we adopted the results and conclusions offered by the original authors of each selected report when collecting the genes, which inevitably impacts our results due to possible bias and insufficiency in the available reports. Then, in order to decrease the false-positive rate of ADassociated genes, we eliminated reports with insignificant or negative results. Nevertheless, we cannot avoid the fact that some genes in those studies might be actually associated with the disease phenotype. Additionally, although the GO terms enriched in Alzgset could provide valuable hints and might serve as an important resource for understanding the molecular mechanisms of AD, it should be noted that GO is biased towards fields like cancer biology and the concepts related to neurology are underrepresented [81]. Thus, some important neurological processes related to AD may be missed in our analysis. At the same time, despite overall levels of protein-protein interaction databases having been greatly improved, the present human interactome is still incomplete and some false-positive data may also be included [82]. Thus, the present research status of the human interactome may also influence our results. It can be expected that, as the protein-protein interaction data become more comprehensive and accurate, the inferred AD-specific subnetwork can become more reliable and valuable.

Conclusions
In summary, via a systems biology approach, we investigated the pathways and molecular networks related to AD based on the genes associated with the disease. Integrating biological function, biochemical pathway, and pathway crosstalk analyses, we identified that biochemical processes and pathways linked with lipid and/or lipoprotein-related processes, metabolism, the immune system, and neural development were overrepresented among Alzgset and there existed three interconnected pathway modules: neuronal and metabolic module, cell growth/survival and neuroendocrine clique, and immunological cluster. What is more, an AD-specific protein network was built via the Steiner minimal tree algorithm and some novel genes latently associated with AD were predicted. Such analysis of genes involved in AD will not only improve our understanding of the