AlzGPS: A Genome-wide Positioning Systems Platform to Catalyze Multi-omics for Alzheimer’s Therapeutic Discovery

Background Over15 million family members and caregivers have expended $220 billion for care of patients with AD and other dementias, and the attrition rate for AD clinical trials (2002-2012) is estimated at 99.6%. While recent DNA/RNA sequencing and other multi-omics technologies have advanced the understanding of the biology and pathophysiology of AD, no effective disease-modifying or preventive therapies, for AD have emerged in the past two decades. A new approach to integration of the genome, transcriptome, proteome, and human interactome in the drug discovery and development process is essential for this endeavor. Methods In this study, we developed AlzGPS (Genome-wide Positioning Systems platform for Alzheimer’s Therapeutic Discovery, https://alzgps.lerner.ccf.org), a comprehensive systems biology tool to enable searching, visualizing, and analyzing multi-omics, various types of heterogeneous biological networks, and clinical databases for target identification and effective prevention and treatment of AD. Results Via AlzGPS: (1) we curated more than 100 AD multi-omics data sets capturing DNA, RNA, protein, and small molecules’ profiles underlying AD pathogenesis (e.g., early vs. late stage and tau vs. amyloid endophenotype); (2) we constructed endophenotype disease modules by incorporating multi-omics findings and human protein-protein interactome networks; (3) we identified repurposable drugs from ∼3,000 FDA approved/investigational drugs for AD using state-of-the-art network proximity analyses; (4) we curated 300 literature references for highly repurposable drugs; (5) we included information from over 200 ongoing AD clinicals noting drug mechanisms and primary drug targets, and linking them to our integrated multi-omics view for targets and network analyses results for the drugs; (6) we implemented a highly interactive web-interface for database browsing and network visualization. Conclusions Network visualization enabled by the AlzGPS includes brain-specific neighborhood networks for genes-of-interest, endophenotype disease module networks for data sets-of-interest, and mechanism-of-action networks for drugs targeting disease modules. By virtue of combining systems pharmacology and network-based integrative analysis of multi-omics data, the AlzGPS offers actionable systems biology tools for accelerating therapeutic development in AD.

network analyses results for the drugs; (6) we implemented a highly interactive web-1 interface for database browsing and network visualization. 2 Conclusions: Network visualization enabled by the AlzGPS includes brain-specific 3 neighborhood networks for genes-of-interest, endophenotype disease module networks 4 for data sets-of-interest, and mechanism-of-action networks for drugs targeting disease 5 modules. By virtue of combining systems pharmacology and network-based integrative 6 analysis of multi-omics data, the AlzGPS offers actionable systems biology tools for 7 accelerating therapeutic development in AD. AD drug discovery. For example, transgenic rodent models used to test drugs may not 23 Z h o u e t a l . 2 0 2 0 fully represent human AD pathobiology (6). Also, there is a lack of sensitive measures 1 for outcomes in clinical trials. Other potential immediate causes for clinical trial failures 2 include targeting the wrong pathobiological or pathophysiological mechanisms, 3 attempted intervention at the wrong stage (too early or too late), unfavorable 4 pharmacodynamic and pharmacokinetic characteristics of the drug (e.g., poor brain 5 penetration), lack of target engagement by drug candidates, and hypothesis that fail to 6 incorporate the great complexity of AD (6, 7). 7 Multiple types of omics data have greatly facilitated our understanding of the 8 pathobiology of AD. For example, using single-cell RNA-seq, a novel microglia type 9 (termed disease-associated microglia, DAM) was discovered to be associated with AD, 10 understanding of whose molecular mechanism could offer new therapeutic targets (8). 11 Using large-scale genome-wide association studies (GWAS), twenty loci showed 12 genome-wide significant association with Alzheimer's disease, among which 11 were 13 newly discovered (9). A recent study using deep profiling of proteome and 14 phosphoproteome prioritized proteins and pathways associated with AD, and it was 15 shown that protein changes and their corresponding RNA levels only partially coincide 16 (10). The large amount of multi-omics data and recent advances in network-based 17 methodologies for drug repurposing today present unprecedented opportunities for 18 accelerating target identification for drug discovery for AD, and this potential has also 19 been demonstrated in other complex diseases as well, such as cancer (11), 20 cardiovascular disease (12), and schizophrenia (13), and are beginning to be exploited 21 in AD (6,14). Drug repurposing offers a rapid and cost-effective solution for drug 22 discovery for complex disease, such as the current global pandemic of coronavirus 23 Z h o u e t a l . 2 0 2 0 disease 2019 16) and AD (6). The central idea of network-based drug 1 repurposing is that for a drug to be able to affect a disease, the drug targets must 2 directly overlap with or be in the immediate vicinity of the disease modules, which can 3 be identified using the vast amount of high-throughput sequencing data ( Figure 1A). 4 Our recent efforts using network-based methodologies and AD omics data have led to 5 the discovery of two drugs that show efficacy in network models in AD: sildenafil and 6 pioglitazone (14). Network analysis provides potential mechanisms for these drugs and 7 facilitates experimental validation. Therefore, we believe posit that a comprehensive 8 systems biology tool in the framework of network-based multi-omics analysis could 9 inform Alzheimer's patient care and therapeutic development. 10 To this end, we present a new freely-available database and tool, named AlzGPS 11 (A Genome-wide Positioning Systems platform for Alzheimer's Therapeutic Discovery), 12 for target identification and drug repurposing for AD. AlzGPS was built with large scale 13 diverse information, including multi-omics (genomics, (bulk and single cell) 14 transcriptomics, proteomics, and interactomics) of human and other species, drug-target 15 network, literature-derived evidence, AD clinical trials information, and network proximity 16 analysis ( Figure 1B). Our hope is that AlzGPS will be a valuable resource for the AD 17 research community for several reasons. First, AlzGPS contains abundant multi-domain 18 information types all coalesced in one location. The manually curated data, such as the 19 literaturederived information for the most promising repurposable drugs and more than 20 100 multi-omics AD data sets, are of high quality and relevance. Second, using state-of-21 the-art network proximity approaches, AlzGPS provides a systemic evaluation of 3000 22 FDA approved or investigational drugs against the AD data sets. These results (along 23 Z h o u e t a l . 2 0 2 0 with various network visualizations) will provide insights for potential repurposable drugs 1 with clear network-based footprints in the context of the human protein interactome. The 2 drug-data set associations can be further explored in AlzGPS for individual drug targets 3 or genes associated with AD. Lastly, AlzGPS offers a highly interactive and intuitive 4 modern web interface. The relational nature of these data was embedded in the design 5 to help the user easily navigate through different types of information. In addition, 6 AlzGPS provides three types of network visualizations for the tens of thousands of 7 networks in the database, including brain-specific neighbor networks for genes, disease 8 modules for data sets, and inferred mechanism-of-action (MOA) networks for drugs and 9 data set pairs with significant proximity. AlzGPS is freely available to the public without 10 registration requirement at https://alzgps.lerner.ccf.org. 11 12 13

14
Data collection and preprocessing 15 AD data sets. A data set is defined as either (1) genes/proteins/metabolites that are 16 differentially expressed in AD patients/mice versus controls; or (2) genes that have 17 known associations with risks of AD from literature or other databases. We retrieved 18 expression data sets underlying AD pathogenesis capturing transcriptomics (microarray, 19 bulk or single-cell RNA-Seq) and proteomics across human, mouse, and model 20 organisms (e.g. fruit fly and C. elegans). All the samples of the data sets were derived 21 from total brain, specific brain regions (including hippocampus, cortex, and cerebellum), 22 and brain-derived single cells, such as microglial cells. For some of the expression data 23 Z h o u e t a l . 2 0 2 0 sets, the differentially expressed genes/proteins were obtained from the original 1 publications (from main tables or supplemental tables). For other data sets that did not 2 have such differential expression results available, the original brain microarray/RNA-3 Seq data were obtained from Gene Expression Omnibus (GEO) (17) and differential 4 expression analysis was performed using the tool GEO2R (18). GEO2R performs the 5 differential expression analysis for the sample groups defined by the user using the 6 limma R package (19). All differentially expressed genes identified in mouse were 7 further mapped to unique human-orthologous genes using the NCBI HomoloGene 8 database (https://www.ncbi.nlm.nih.gov/homologene). The details for all the data sets, 9 including organism, genetic model (for mouse), brain region, cell type (for single-cell 10 RNA-Seq), PubMed ID, GEO ID, and the sources (e.g., supplemental table or GEO2R), 11 etc., can be found in Table S1. 12 Genes and Proteins. We retrieved the gene information from the HUGO Gene 13 Nomenclature Committee (HGNC, https://www.genenames.org/) (20), including gene 14 symbol, name, type (e.g., coding and non-coding), chromosome, synonyms, and 15 identification (ID) mapping in various other databases such as National Center for 16 Biotechnology Information (NCBI) Gene, ENSEMBL, and UniProt. All proteins from the 17 AD proteomics data sets were mapped to genes using the mapping information from 18

HGNC. 19
Single-nucleotide polymorphisms (SNPs). We found 3,321 AD-associated genetic 20 records for 1,268 genes mapped to 1,629 SNPs, by combining results from GWAS 21 Catalog (https://www.ebi.ac.uk/gwas/) (21) using the trait "Alzheimer's disease" and 22 published studies. The PubMed IDs for the genetic evidence are provided on AlzGPS. 23 Line Entry System (SMILES) and Anatomical Therapeutic Chemical (ATC) code(s). We 13 also evaluated the pharmacokinetic properties (such as blood-brain barrier [BBB] 14 penetration) of the drugs using admetSAR (23,24  To quantify the associations between drugs and AD-related gene sets from the data 5 sets, we adopted the "closest" network proximity measure: 6 where ( , ) is the shortest path length between gene and from gene list (drug 8 targets) and (AD genes), respectively. To evaluate whether such proximity was 9 significant, we performed z score normalization using a permutation test of 1,000 10 repeats. In each repeat, two randomly generated gene lists that have similar degree 11 distributions to and were measure for the proximity. The z score was calculated as: 12 (3) 13 P value was calculated according the permutation test. Drug-data set pairs with Z < -1.5 14 and P < 0.05 were considered significantly proximal. In addition to network proximity, we 15 calculated two additional metrics, overlap coefficient and Jaccard index , to quantify 16 the overlap and similarity of and : 17 e t a l . 2 0 2 0

Generation of networks 1
We offer three types of networks on AlzGPS: brain-specific neighborhood (EGO) 2 network for the genes, largest connected component (LCC) network for the data sets, 3 and inferred MOA network for significantly proximal drug-data set pairs. The three 4 networks differ by inclusion criteria of the nodes (genes/proteins). The edges are PPIs 5 colored by their types (e.g., 3D, Y2H, and literature). All networks are colored by 6 whether they can be targeted by the drugs in our database. 7 For the EGO networks, we filtered genes by their brain expression specificity and 8 generated only the network for those with positive brain specificity. We used the 9 ego_graph function from NetworkX (33) to generate the EGO networks. The networks 10 are centered around the genes-of-interest. An LCC network was generated for each AD 11 data set using the subgraph function from networkx. For MOA, we examined the 12 connections (PPIs) among the drug targets and the data sets. 13 14

Website implementation 15
AlzGPS was implemented with the Django v2.2.2 framework (www.djangoproject.com). 16 The website frontend was implemented with HTML, CSS, and JavaScript. The frontend 17 was designed to be highly interactive and integrative. It uses AJAX to asynchronously 18 acquire data in JSON format based on user requests to dynamically update the frontend 19 interface. This architecture can therefore be integrated into end users' own pipelines. 20 Network visualizations were implemented using Cytoscape.js (34). One key feature of AlzGPS is the highly diverse yet interconnected data types ( Figure  3 1). The three main data types are genes, drugs, and AD-relevant omics data sets. More 4 than 100 omics data sets were processed, including 84 expression data sets (Table S1) 5 from AD transgenic animal models or patient-derived samples and 27 data sets from the 6 literature or acquired from other databases. The expression data sets contain 7 transcriptomic and proteomic data of human and rodent samples. Comparative sample 8 groups were available in these data sets, such as early stage vs. late stage, healthy vs. 9 AD. The differentially expressed genes/proteins were calculated for each data set. 10 The statistics and relations of the database are shown in Figure 1B. We 11 collected and processed all the basic information (see Methods) and then constructed 12 the relationships among the data types. For example, for genes and drugs, the 13 relationship is drugs targeting proteins (genes); for gene and data set, the relationship is 14 genes being differentially expressed in the expression data sets or included in other 15 types of data sets, such as literature-based; for drug and data set, the proximity 16 between each pair was calculated (see Methods) to identify the drugs that are 17 significantly proximal to a data set, and vice versa. 18 Additional data types were collected or generated. For genes, these included 19 genetic evidence (variants associated with AD) and tissue expression specificity to 20 provide additional information for target gene identification. For drugs, we collected the 21 data from ongoing clinical trials, including the proposed mechanism and therapeutic 22 purpose (29) & (30). The trials were mapped to drugs. The BBB probability was 23 Z h o u e t a l . 2 0 2 0 computed (23, 24). For the top 300 drugs with the highest number of significant 1 proximities to all the data sets, we manually curated the available literature. A total of 2 292 studies were found for 147 drugs (49%) that reported the associations of the drugs 3 and AD. We grouped these studies into clinical and non-clinical, and extracted trial 4 information for clinical type and experimental setting (number and type of patients) for 5 both types. We also summarized and provide the study results. 6 7 Web interface and network visualizations 8 A highly interactive web interface was implemented (Figure 2). On the home page 9 (Figure 2A), the user can search for drugs, genes, metabolites, and gene variants. The 10 user can directly list all drugs by their first-level ATC code, all AD data sets available, 11 and all the ongoing clinical trials ( Figure 2B). The search results are displayed in the 12 "DATA TABLE" tab and switched with their associated buttons in the "RESULT" section 13 on the left. Each data entity has its own data table for the associated information in the 14 "DATA TABLE" tab. For example, on the gene page of APP ( Figure 2B) is the basic 15 information (green rows), such as name, type, chromosome, and synonym; descriptions 16 for the derived data (purple rows), such as tissue specificity and number of genetic 17 records; and external links (red row). Data for the relations of APP and other entities 18 can be loaded by clicking the button in "DETAIL" (blue row). For example, the 19 expression data sets in which APP is differentially expressed can be found by clicking 20 the "Dataset" button ( Figure 2B). Any data loaded will be added to the same explorer. 21 The buttons in the "RESULT" are organized in trees. For example, APP is included in 22 the "V1 AD-seed" data set, which contains 144 AD-associated genes with strong 23 Z h o u e t a l . 2 0 2 0 literature evidence. When the user clicks this data set in the APP gene table, a new 1 data table for the "V1 AD-seed" data set will replace the the APP gene page, and a new 2 button with indentation will appear below the APP button in "RESULT" (Figure 2B). 3 An all-in-one interactive explorer that minimizes the need for navigation of 4 information using the relational nature of these data is a major feature of the web 5 interface. Another major feature is the network visualizations. We offer three types of 6 networks, (1) the brain-specific neighborhood network (EGO) for a gene-of-interest that 7 shows the PPIs with its neighbors ( Figure 2C); (2) the largest connected component 8 (LCC) network for a data set that shows the largest module formed by the genes in this 9 data set ( Figure 2D); and (3) inferred MOA network for a significantly proximal drug-10 data set pair, which is illustrated in the case studies below. 11 12 Case study -target identification 13 Generally, using AlzGPS for AD target identification starts with selecting one or a set of 14 data sets ( Figure 2B, "DATASET" tab). Users can select a data set based on 15 organisms, methods (e.g., single-cell/nuclei RNA-Seq), brain regions, and comparisons 16 (e.g., early-onset AD vs healthy control) for the expression data sets. Additionally, we 17 have collected data sets from the literature, other databases, or computationally 18 predicted results. Here, we use the "V1 AD-seed" data set as a starting point. This data 19 set was from our recent study which contains 144 AD-associated genes based on 20 literature-derived evidence. We found that 118 genes were differentially expressed as 21 shown in at least one data set. By browsing these genes, we are associated with risk of AD (37). MAPT is differentially expressed in five expression 5 data sets ( Figure 3A) and has high brain specificity. Five pieces of genetic evidence 6 were found for MAPT. MAPT can be targeted by 27 drugs. In addition, many of its direct 7 PPI neighbors are targetable, suggesting a potential treatment strategy by targeting 8 MAPT and its neighbors. 9 BIN1. BIN1 is one of the most important susceptibility genes for late-onset AD (38), and 10 can modulate tau pathology (39). Higher levels of BIN1 expression are associated with 11 a delayed age of AD onset (40). Differentially manifested in five data sets, BIN1 has 47 12 genetic record associations ( Figure 3B). Although no drugs are known to target BIN1, 13 many of the BIN1's PPI neighbors can be targeted. 14 APOE. The ε4 allele of APOE is the main genetic risk factor of AD (41). Apolipoprotein 15 E ε4 plays an important role in Aβ deposition (41), a major pathological hallmark of AD. 16 APOE is differentially expressed in 22 data sets ( Figure 3C). It has a high number of 17 associated genetic records -91. Both APOE and its PPI partners can be targeted. 18 BACE1. β-secretase 1 (BACE1) cleaves APP and generates amyloid-β peptides (42), 19 whose aggregation is another pathological hallmark of AD. The inhibition of BACE1 has 20 been a popular target for AD drug development. Shown in Figure 3D, BACE1 is 21 differentially expressed in 4 data sets. In this section, we use sildenafil and pioglitazone as two examples. In our recent 2 studies, we found that both sildenafil and pioglitazone were associated with a reduced 3 risk of AD using network proximity analysis and retrospective case-control validation 4 (14). Mechanistically, in vitro assays showed that both drugs were able to downregulate 5 cyclin-dependent kinase 5 (CDK5) and glycogen synthase kinase 3 beta (GSK3B) in 6 human microglia cells. These drugs were discovered using different data sets. Sildenafil 7 was found using a high-quality literature-based AD endophenotype module (available as 8 AlzGPS data set "V1 AD-seed") containing 144 genes. Pioglitazone was found using 9 103 high-confidence AD risk genes (available as AlzGPS data set "V4 AD-inferred-10 GWAS-risk-genes") identified by GWAS (13). 11 AlzGPS provides a list-view of the network proximity results of all the drugs 12 organized by their first-level ATC code, which can be found in the "DRUG CLASS" tab 13 ( Figure 2B). The drugs are ranked by the number of significant proximities to the data 14 sets. Sildenafil is the top four of the 148 drugs under the ATC code G "Genito-urinary 15 system and sex hormones" with network proximity results, the top three being 16 vardenafil, ibuprofen, and gentian violet cation. Pioglitazone is the top sixth of the 226 17 drugs under the ATC code A "Alimentary tract and metabolism", following tetracycline, 18 human insulin, epinephrine, cholecalciferol, and teduglutide. Both drugs achieved high 19 numbers of significant proximities to the expression data set. Next, we examined the 20 basic information of these drugs (Figure 4A and 4E). Both drugs are predicted to be 21 BBB penetrable. Sildenafil has 20 known targets and is significantly proximal to 27 of 22 the 111 data sets (Figure 4A). We found one non-clinical study that reported that 23 Z h o u e t a l . 2 0 2 0 sildenafil treatment improves cognition and memory of vascular dementia in aged rats 1 (43) ( Figure 4C). As noted, we identified the potential of sildenafil against AD using the 2 AD endophenotype module (Figure 4B, Z = -2.44, P = 0.003). Then, clicking the 3 corresponding "MOA (mechanism-of-action)" button opened the inferred MOA network 4 for sildenafil and the data set ( Figure 4D). Although sildenafil does not target the genes 5 in the data set (green) directly, it can potentially alter them through PPIs with its targets 6 (blue). 7 Pioglitazone has 8 known targets and is significantly proximal to 34 data sets 8 ( Figure 4E). Five studies, containing both clinical and non-clinical data were found to be 9 related to treating AD with pioglitazone. For example, a clinical study showed that 10 pioglitazone can improve cognition in AD patients with type II diabetes (44) (Figure 4G). 11 Similarly, network results and associated MOA networks suggested that pioglitazone 12 can affect AD risk genes through PPIs (Figure 4F and Figure 4H). 13 14

Validation studies 15
Once candidate agents are identified on AlzGPS, a variety of validation steps can be 16 pursued (6). The agent can be tested in animal model systems of AD pathology to 17 evaluate the predicted MOA of behavioral and biological effects. Since these are 18 repurposed agents and have been used for other indications in human healthcare, 19 electronic medical records can be interrogated to determine if there are notable effects 20 on AD incidence, prevalence, or rate of progression. Both these methods are imperfect 21 since animal models have rarely been predictive of human response, and doses and 22 duration of exposures may be different for indications of other then AD in which the 23 Z h o u e t a l . 2 0 2 0 candidate agents are used. The ultimate assessment that could make an agent 1 available for human care is success in a clinical trial and nominated agents must 2 eventually be submitted to trials. If repurposed agents are not entered into trials 3 because of intellectual property limitations or other challenges, the information from 4 AlzGPS may be useful in identifying druggable disease pathways or providing seed 5 structures that provide a basis for creation of related novel agents with similar MOAs. 6 7 Conclusions 8 AlzGPS contains rich and diverse information connecting genes, AD data sets, 9 and drugs for AD target identification and drug repurposing. It utilizes multiple biological 10 networks and omics data such as genomics, transcriptomics, and proteomics, and 11 provides network-based drug repurposing results with network visualizations. AlzGPS 12 will be a valuable resource to the AD research community. We will continue to add more 13 types of omics data and update AlzGPS annually or when a large amount of new data is 14 available. In summary, AlzGPS presents the first comprehensive in silico tool for human 15 genome-informed precision medicine drug discovery for AD. From a translational 16 perspective, if broadly applied, AlzGPS will offer a powerful tool for prioritizing 17 biologically relevant targets and clinically relevant repurposed drug candidates for multi-18 omics-informed therapeutic discovery in AD and other neurodegenerative diseases.   show the gene page. On the gene page, we show a summary of several statistics of the 3 gene in AlzGPS, including the number of drugs that can target it, number of data sets of 4 omics in which the target/protein coding gene is differentially expressed, number of 5 genetic records, and the brain-expression specificity. Detailed information can be 6 loaded by clicking corresponding buttons. Examples of detailed differential expression 7 results and genetic records are shown for these four genes. In addition, a brain-specific 8 neighborhood network is available that centers around the gene-of-interest and show 9 the targetability of its neighborhood.