- Research
- Open access
- Published:
Drug repurposing for Alzheimer’s disease using a graph-of-thoughts based large language model to infer drug-disease relationships in a comprehensive knowledge graph
BioData Mining volume 18, Article number: 51 (2025)
Abstract
Drug repurposing (DR) offers a promising alternative to the high cost and low success rate of traditional drug development, especially for complex diseases like Alzheimer’s disease (AD). This study addressed DR for AD from three key angles: (1) demonstrating how disease-specific knowledge graphs can improve DR performance, (2) evaluating the role of large language models (LLMs) in enhancing the usability and efficiency of these graphs, and (3) assessing whether Graph-of-Thoughts (GoT)-enhanced LLMs, when integrated with AD knowledge graphs, can outperform traditional machine learning and LLM-based approaches. We tested five distinct DR strategies (DR1–DR5) for AD: DR1, a machine learning method using TxGNN; DR2, a machine learning model leveraging the Alzheimer’s KnowledgeBase (AlzKB); DR3, an LLM-based chatbot built on AlzKB; DR4, our ESCARGOT framework combining GoT-enhanced LLMs with AlzKB; and DR5, a general reasoning-driven LLM approach. Results showed that AlzKB significantly improved DR outcomes. ESCARGOT further enhanced performance while reducing the need for coding or advanced expertise in knowledge graph analysis. Because the architecture of AlzKB is easily adaptable to other diseases and ESCARGOT can integrate with various knowledge graph platforms, this framework offers a broadly applicable, innovative tool for accelerating drug discovery through repurposing.
Introduction
Drug development has long been recognized as a cost-intensive and resource-demanding endeavor. Recent studies [54, 56] estimate that the investment required to bring a new drug to market ranges from several hundred million to billions of dollars, depending on the disease type and the complexities of the development process. Among various diseases, Alzheimer’s Disease (AD) presents a particularly daunting challenge. With over 50 million patients globally, a number that continues to rise, AD has a profound impact on public health. However, drug development for AD remains one of the most difficult undertakings in pharmacology. This is largely due to the incomplete understanding of its etiology and pathological mechanisms. To date, fewer than 10 drugs for AD have been approved by the FDA over the past three decades [70]. Even these treatments have limited efficacy, and their clinical benefits are still debated and not widely accepted. Adding to this challenge is the extremely high failure rate in clinical trials. For instance, over 95 percent of AD drugs entering clinical trials ultimately fail [12]. This underscores the urgent need for innovative approaches to tackle the complexities of AD drug discovery and development.
To address the high costs and low success rates associated with AD drug development, significant research has focused on identifying suitable repurposed drugs among existing FDA-approved treatments [22]. Drug repurposing (DR) involves discovering new therapeutic uses for existing drugs, leveraging their established safety profiles and mechanisms of action. This approach significantly reduced costs and shorter development timelines. It also improves success rates in clinical trials by bypassing much of the early-stage research and safety testing required for novel compounds. Recent data suggest that approximately 30 percent of new FDA-approved drugs and vaccines in recent years have originated from DR efforts [48]. A notable example is Zidovudine, initially developed as a cancer treatment, which became the first FDA-approved anti-HIV drug in 1987. Similarly, numerous drugs have been successfully repurposed using pharmacological analysis and in vitro compound screening [49]. This highlights the potential of DR as a cost-effective and time-efficient strategy for advancing AD drug development.
The strategy for DR in AD closely mirrors that used for other diseases, combining computational and experimental approaches to identify new therapeutic uses for existing drugs. Computational methods have become an essential first step to rapidly screen large datasets and prioritize high-quality drug candidates. Once promising candidates are identified, experimental approaches can be applied synergistically to further refine and confirm their therapeutic potential [26]. Among computational strategies, machine learning (ML) has emerged as a transformative tool for DR in analyzing complex biological data, uncovering intricate patterns, and predicting potential drug-disease interactions [29, 51, 62]. Additionally, the rapid development and versatile applications of knowledge graphs in the biomedical domain have demonstrated significant potential in this field. A knowledge graph (KG) is a structured representation of entities—such as drugs, diseases, genes, and pathways—and their relationships, organized in a graph-based format. This structure enables the integration of diverse knowledge from multiple sources, providing a comprehensive and interpretable view of complex relationships. The inherent interpretability of KGs is particularly advantageous in DR, where understanding the mechanisms underlying drug-disease interactions is crucial [5, 69]. Additionally, the graphical representation of relationships facilitates more explainable findings.
Drug discovery can be framed as a link prediction problem on a knowledge graph, where ML algorithms are employed to identify potential relationships between entities (e.g., Drug–Treat–Disease) [6, 47, 57]. Several solutions have been developed to leverage ML on knowledge graphs for DR. For instance, KG-Predict [20] integrates diverse entities and relationships from genotypic and phenotypic databases to construct a comprehensive KG. It employs an embedding module to generate low-dimensional representations of entities and relationships and a prediction module to score triples for link prediction. TxGNN [25] uses a graph neural network (GNN) pre-trained on a comprehensive KG to facilitate DR, showcasing the potential of deep learning in this domain. Another study [1] introduced a random-walk-based strategy to overcome limitations in link prediction caused by constrained search scopes and the “guilt-by-association” problem, broadening the discovery space. In the context of AD, similar strategies combining knowledge graphs and machine learning have been developed. In our previous work, we introduced AlzKB, an AD-specific graph knowledge base [52]. Using an embedding-based link prediction ML method, we demonstrated its efficacy in DR for AD.
Knowledge graphs enhanced by ML have demonstrated tremendous potential in driving advancements in DR. However, leveraging these tools effectively demands a high level of technical proficiency and data science expertise, which can present significant barriers for researchers lacking such skills. Most KGs are built on graph databases like Memgraph (https://memgraph.com) or Neo4j (https://neo4j.com), which often require proficiency in query languages such as Cypher to navigate and utilize their advanced features. Furthermore, analyzing KG data with ML methods demands advanced data science skills to extract meaningful insights. While pre-developed platforms like KG-Predict and TxGNN provide user-friendly interfaces for exploratory work, they are often constrained by the predefined datasets and ML algorithms integrated into the platform. This limits the scope of DR research, as users lack control over what data to examine and how to analyze it for discovery purposes. To address the gap between the high potential of KGs and the technical barriers faced by many researchers, emerging large language models (LLMs) present a promising solution. LLMs offer intuitive, natural language-based interactions and can streamline complex processes, making it easier for researchers to leverage KGs without requiring in-depth technical expertise.
LLMs, such as OpenAI ChatGPT and Google Gemini, represent state-of-the-art AI systems designed to understand and generate human-like text. Their ability to process and integrate extensive amounts of data has positioned LLMs as invaluable tools in advancing biomedical research and enhancing healthcare delivery [35, 53, 58, 72]. In the field of drug discovery, there has been growing interest in adopting artificial intelligence (AI) and LLMs. For instance, one article [44] summarized the potential of AI and LLMs in drug discovery, while another [67] investigated DR for AD using ChatGPT. Beyond these applications, the integration of LLMs with KGs has been increasingly discussed in various domains [31, 41, 45]. Despite this progress, the application of AI and LLMs in conjunction with KGs remains underexplored. This presents a significant opportunity to harness the strengths of LLMs for analyzing and interacting with KGs, potentially unlocking new insights and accelerating drug discovery efforts.
LLMs, with their enhanced usability and efficiency compared to traditional ML methods, appear to be well-suited for working with KGs. However, there has been a lack of direct comparisons between LLM-based methods and conventional ML approaches in the context of KG-driven DR. Furthermore, to improve the adaptability and performance of LLMs when applied to graph structures, several advanced strategies have been developed, including Retrieval-Augmented Generation (RAG) [59], Chain-of-Thoughts (CoT) [28, 66], Tree-of-Thoughts [68], and Graph-of-Thoughts (GoT) [3]. Notably, our previously published tool, Knowledge Retrieval Augmented Generation Engine (KRAGEN) [38], which has recently been upgraded to Enhanced Strategy and Cypher-driven Analysis and Reasoning using Graph-Of-Thoughts (ESCARGOT) [37], stands as the first GoT-enabled application indexed in PubMed designed for the biomedical domain and specifically optimized for KGs. In this study, we further evaluate the application of ESCARGOT in DR, using AD as a test case, while also comparing its performance with both a conventional ML approach and a baseline LLM-based method.
This study focuses on the following key objectives: Demonstrating how disease-specific KGs provide structured information that enhances DR performance; Investigating how LLMs can improve KG usability and efficiency; Evaluating whether a GoT-supported LLM integrated with a KG can achieve performance comparable to or exceeding that of conventional ML or LLM approaches. By addressing these objectives, we aim to establish a robust framework for integrating LLMs and KGs in DR research, providing novel insights into their synergistic potential for tackling complex biomedical challenges.
Methods
In this study, we explored DR for AD using five distinct methods, as outlined in Fig. 1. Various data resources, including publicly available information, scientific and non-scientific articles, and biomedical or non-biomedical databases, can be processed by domain experts with DR expertise to construct knowledge graphs. These DR-focused KGs can be broadly designed to encompass all possible diseases, making them versatile for various applications. Machine learning algorithms can then leverage these comprehensive KGs to predict potential drug candidates for repurposing. In this study, we tested the publicly available TxGNN tool as the first method for this purpose (denoted as DR1 in Fig. 1).
Alternatively, KGs can be specifically tailored to a single disease, such as AlzKB designed for AD. Building on AlzKB, we developed three additional methods to support DR and exploratory research: (1) an ML-based approach (DR2 in Fig. 1), (2) an LLM-based chatbot (DR3 in Fig. 1), and (3) ESCARGOT, a GoT enabled LLM-based method (DR4 in Fig. 1). In addition to KGs, extensive biomedical datasets and other holistic data sources can be directly utilized to train general LLMs, which supports the development of DR workflows to identify candidate drugs. For this method, we referenced drugs proposed in a recent publication [67], which identified potential AD drugs by summarizing results from iterative LLM queries (denoted as DR5 in Fig. 1).
A The architecture of AlzKB (Alzheimer’s Knowledge Base) highlights entities and their relationships. The DR-by-LLM strategy focuses on analyzing immediate linked to AD, such as Genes and Drugs, as well as their extended connections, including Pathway, BodyPart, and DrugClasses. B Examples of AD and drug connections. B1) Disease-Gene-Drug and AD pathway: AD to gene IL1B to drug Mitoxantrone, IL1B is in AD pathway “Alzheimer Disease”; B2) Disease-Gene-Drug and AD bodypart: AD to gene PSEN1 to drug Thioridazine, PSEN1 is expressed in AD related body part “telencephalon”
Drug Repurposing with a General Knowledge Graph and Machine Learning (DR1)
TxGNN [25] is a general DR knowledge graph that utilizes machine learning techniques, particularly graph neural networks (GNNs), to predict potential drug candidates for repurposing. It integrates general biomedical data and can be applied to various diseases, not limited to AD. TxGNN incorporates a metric learning module and was trained end-to-end, with both pretraining and fine-tuning focused on drug–disease relationships to improve embedding quality and prediction accuracy. By querying possible AD drugs on the TxGNN Explorer website (https://txgnn.org/), we obtained a list of drug candidates for AD, with associated probabilities or scores reflecting the likelihood that each drug is effective for the disease. These scores result from the graph neural network’s in-depth analysis of drug-disease relationships, molecular interactions, and other relevant biological data.
Drug Repurposing with Machine Learning Using AlzKB (DR2)
This approach integrates AlzKB, an AD-specific knowledge graph, with a machine learning algorithm RotatE [61]. AlzKB is an ontology-based knowledge graph developed by our team using Memgraph’s graph technology to advance research on AD. It encompasses entities such as genes, pathways, drugs, and diseases. Figure 2A illustrates the architecture of AlzKB, detailing its entity types (totaling over 234,000 entities) and their relationships. In our previous study [52], we compared RotatE with other machine learning methods, including TransE, ComplEx, MultDist, and ConvE, and found it to be the most effective for knowledge graph completion. RotatE is a lightweight model specifically designed for graph-structured data. It embeds entities as complex vectors and represents relationships as rotations in a complex space, efficiently capturing various relationship types (e.g., one-to-one, one-to-many, and many-to-many). It is significantly less resource-intensive than graph neural networks (GNNs). By applying RotatE to AlzKB, we showcased how a disease-specific KG can enhance DR performance, even when using a relatively simple ML algorithm.
In this drug repurposing study, we performed extensive hyperparameter tuning to optimize RotatE’s performance. Using PyKEEN with Optuna, we tuned the loss function, regularizer, negative sampler, batch size, embedding dimension, learning rate, and optimizer settings. A total of 30 trials were conducted, each running up to 100 epochs with early stopping. The final RotatE model was trained for 40 epochs with a batch size of 256 and an embedding dimension of 960, while all other parameters remained at PyKEEN’s default values.
Drug Repurposing with AlzKB Chatbot (DR3)
With the rapid advancements in generative AI, particularly ChatGPT and other LLMs, we have developed a ChatGPT-like chatbot for AlzKB as shown in Suppl. Figure 1, accessible via its website at https://alzkb.ai/. Without the chatbot, users would need to rely on query languages like Cypher, which is used with Memgraph—the graph database powering AlzKB—to navigate the knowledge base. The chatbot streamlines this process by converting text-based questions into appropriate Cypher queries, enabling more efficient information retrieval and DR tasks. Additionally, integrating an LLM enables seamless interaction between AlzKB’s information and the broader data resources that trained the LLM, improving the utility and processing of knowledge graph data.
Our DR approach using the AlzKB chatbot systematically identifies potential therapeutic candidates for AD by exploring multi-hop relationships between diseases, pathways, body parts, genes, and drugs. Initially, it identifies pathways and body parts linked to AD, followed by extracting genes associated with both the identified pathways and body parts. The chatbot then retrieves drugs interacting with these genes. To refine the selection, the chatbot ranks drugs based on their number of unique gene connections, prioritizing compounds with broad genetic interactions relevant to AD pathology. By testing this DR approach, we can evaluate the performance of an LLM-based method against a ML-based method, both utilizing the disease-specific AlzKB.
Drug Repurposing with ESCARGOT and AlzKB (DR4)
The LLM-based AlzKB chatbot offers a user-friendly solution for quickly querying the graph database supporting AlzKB, excelling at handling 1-hop questions such as queries about Drug-Gene or Disease-Drug relationships. However, its efficiency diminishes when addressing more complex queries such as 2-hop relationship paths (e.g., Disease-Gene-Drug as shown in Fig. 2B). To address this challenge, we developed ESCARGOT [37], an advanced LLM-powered tool that dynamically generates Python-executable GoT workflows, enabling seamless integration with knowledge graphs. By supporting direct Python execution, ESCARGOT translates knowledge into executable code, ensuring accuracy and minimizing errors. This combination of structured knowledge retrieval and direct execution reduces hallucinations, significantly improving the accuracy and reliability of reasoning and computational outcomes. With ESCARGOT-supported AlzKB, we can demonstrate whether applying a GoT-enabled LLM to a knowledge graph leads to improvements in LLM performance.
Our strategy for identifying DR candidates by ESCARGOT involves thoroughly exploring all entities connected to AD in AlzKB, such as drugs and genes, along with their related entities like drug classes, pathways, body parts and transcription factors [34, 34, 36, 74, 73]. Other potential entities, such as GO terms, were also evaluated. However, they did not provide additional filtering effects on the drug candidates and were therefore excluded from this experimental design.
The 2-hop disease-to-drug relationships illustrated in Fig. 2B establish the criteria for selecting repurposed drug candidates. In the Disease-Gene-Drug scenario, a drug qualifies as a candidate if it connects to a gene associated with AD, resulting in a large pool of potential drugs. To refine this selection, we apply a series of filtering steps. The first filter considers gene pathways or body parts, applied separately to capture drugs targeting different mechanisms, with a possible final candidate list derived by combining the results. The second filter evaluates the number of genes associated with each candidate drug, using a threshold based on the minimal number of gene interactions observed in known AD drugs. This ensures that candidate drugs align with established interactions observed in known AD drugs, specifically those with therapeutic mechanisms associated with pathways or body parts defined by their connected genes. The final filter retains only drugs that have documented drug class information and are linked to genes regulated by a transcription factor, further enhancing the relevance of the selected candidates. Transcription factors have significant regulatory role in disease pathways that reveal new therapeutical applications for DR [15]. Additionally, it is crucial that repurposed drug candidates have sufficient pharmacological information. In AlzKB, this is defined by DrugClass data, which includes scientifically documented drug properties sourced from DrugCentral via Hetionet [23]. This information encompasses a drug’s mechanism of action, physiological effects, and chemical structure, ensuring a comprehensive understanding of its therapeutic potential.
Figure 3A outlines the step-by-step filtering process for candidate drugs using 2-hop disease-to-drug relationships. A similar filtering approach is applied in the 3-hop Disease-Drug-Gene-Drug scenario, with the key distinction that the genes are restricted to those associated with known AD drugs, such as Donepezil (see “Suppl. Figure3 drugs” for results from the 3-hop disease-to-drug relationships). This refinement narrows the search scope by focusing on genes that could potentially interact with established AD treatments. The complexity of this process has surpassed the capabilities of the AlzKB chatbot. As a result, a Python-executable, GoT structure based LLM application such as ESCARGOT becomes essential for handling complex filtering tasks within a graph database for DR.
(A1) The ESCARGOT three-step filtering process for identifying potential drug candidates connected to genes associated with AD in Fig. 2B1 (Disease-Gene-Drug and AD pathway). The first filter ensures the candidate is connected to the AD pathway; the second filter requires that the gene-drug count exceeds that of known AD drugs; and the third filter confirms the gene has a link to a transcription factor and the drug belongs to a documented drug class. At each step, the figure shows the total number of drugs discovered, how many were previously studied, and the overlap coefficient (Eq. 1). (A2) The same three-step process applies to drug candidates linked to genes associated with AD in Fig. 2B2 (Disease-Gene-Drug and AD bodypart), except the first filter ensures the candidate is connected to a relevant AD body part. (B) Bar chart illustrating the number of novel drugs (blue) and previously studied drugs (orange), with the overlap coefficient (green line; Eq. 1) across the three-step processes in A1 and A2. (C) Precision and recall were calculated for ESCARGOT at each filtering step using four different strategies for selecting result drugs: A1 (AD pathway–based), A2 (AD bodypart–based), the union of A1 and A2, and the intersection of A1 and A2. For each strategy, the precision and recall were compared with those of TxGNN and RotatE, using the same number of top-ranked drugs based on their respective drug scores
Drug Repurposing with Iterative Prompting (DR5)
A recent study [67] utilized ChatGPT directly for DR without constructing any knowledge graph or developing complex machine learning models, identifying potential AD drugs through iterative queries. The study followed a two-step prompt process using ChatGPT (GPT-4). First, ChatGPT was tasked with identifying the twenty most promising DR candidates for AD. In the second step, ChatGPT was asked to verify its previous output and provide a final list of drugs. This process was repeated ten times to account for variability, yielding 59 unique candidates. This approach demonstrates the capability of a general LLM as a quick and straightforward tool for DR and serves as a baseline for comparison with our ESCARGOT method, which integrates a GoT-based LLM with a disease-specific knowledge graph.
Results
We evaluated the drugs identified in our study by comparing them with a curated list from a previous study [22], which summarized published drug candidates proposed for potential repurposing for AD between 2012 and 2022. In this publication, a total of 573 unique drugs were reported in previous publications as potential candidates for repurposing in AD.
Performance Metrics
The metric used to evaluate the efficiency of the approaches in our study is the overlap coefficient (also known as the Szymkiewicz-Simpson coefficient), which quantifies the overlap between two sets relative to the smaller set (Eq. 1). To assess performance, the candidate repurposed drugs identified by the five different approaches in our study are compared against the 573 previously reported AD repurposing drugs. The overlap coefficient serves as an indicator of how effectively each approach can detect potential repurposed drug candidates in a comprehensive and accurate manner. A higher overlap coefficient signifies a more efficient approach.
Where:
-
\(|A \cap B|\) is the size (cardinality) of the intersection of sets A and B, i.e., the number of elements common to both sets.
-
|A| is the size (cardinality) of set A.
-
|B| is the size (cardinality) of set B.
Performance Comparison of Five DR Approaches
Table 1 presents a performance comparison of the five DR approaches evaluated in this study. A detailed list of drugs for each method is provided in Suppl. Table1.
DR1 utilized TxGNN, a deep learning-based machine learning (ML) algorithm leveraging a general disease knowledge graph (KG) to predict potential drug candidates for repurposing. TxGNN identified 200 drugs total with AD repurposing potential, each assigned a drug score reflecting its likelihood of effectiveness. These 200 drugs yielded an overlap coefficient of 0.328. However, when considering only the top 50 and 100 ranked drugs, prioritized based on drug scores for repurposing, both subsets achieved a slightly higher overlap coefficient of 0.34.
In DR2, a lighter machine learning algorithm, RotatE, was applied to AlzKB. To enable a direct comparison with DR1, the same number of candidate drugs were selected, and their corresponding overlap coefficients were calculated. For all drug candidate sets (200, 100, and 50), the overlap coefficients consistently exceeded 0.4, with values of 0.450, 0.490, and 0.440, respectively. This underscores the advantage of utilizing a disease-specific knowledge graph like AlzKB for DR. Even without relying on a complex deep learning model, AlzKB demonstrated performance that is comparable to, if not superior to, TxGNN.
DR3 employed an LLM-based chatbot built for AlzKB using GPT-4 technology. This chatbot allows users to navigate AlzKB by generating and executing Cypher queries, eliminating the need for manual Cypher command development. The DR query strategy we developed for the chatbot encompasses relevant entities and relationships, achieving an overlap coefficient of 0.506 with 85 identified drug candidates. However, refining this list further is challenging due to the LLM chatbot’s limitations in handling complex logical conditions.
DR4 is based on ESCARGOT-supported AlzKB which provides advanced logical reasoning and customized DR searches. ESCARGOT enables additional processing to derive 20 and 39 drug candidates from the Disease-Gene-Drug 2-hop case (see Fig. 3AB). In this case, the AD pathway-first filtering route identified 20 candidate AD repurposing drugs after three levels of filtering, with 14 overlapping with previously reported AD repurposing drugs, yielding an overlap coefficient of 0.700. Meanwhile, the AD body part-first filtering route generated 39 AD repurposing drug candidates, 33 of which overlapped, resulting in a higher overlap coefficient of 0.846. The overlap coefficients steadily increased with each filtering step, suggesting that refining the candidate drug list with relevant entities or conditions improves overall performance. In addition, the Disease-Drug-Gene-Drug 3-hop case was also tested to determine whether a smaller, more manageable set of AD repurposing candidates could be generated with stricter filters. However, it ultimately identified only one drug, with no overlap with previously reported drugs (see “Suppl. Figure3 drugs”). Since the 3-hop case is derived as a subset of the 2-hop case by filtering genes with direct AD connections (“AD genes”), this reduction in candidate drugs is expected. Overall, DR4 successfully demonstrates the capability of ESCARGOT to query complex relationship paths and the feasibility of significantly reducing the number of candidate drugs. To further emphasize this observation, Table 1 presents the intersected and unioned sets of 20 and 39 drug candidates identified in DR4. The intersection results reveal 13 candidate drugs with a high overlap coefficient of 0.846, significantly reducing the final candidate list and making downstream drug evaluation more feasible. The union results include 46 drugs with an overlap coefficient of 0.783, maintaining a manageable set of drug candidates. Notably, ESCARGOT achieved the highest overlap coefficients among all DR approaches in both intersection and union analyses.
DR5 referenced a recent publication [67] that identified potential AD drugs using iterative ChatGPT queries, without relying on KGs. This approach reported a total of 59 candidate drugs, resulting in an overlap coefficient of 0.661.
The performance of ESCARGOT was further evaluated using precision and recall metrics (Fig. 3C, details in “Suppl. Figure3 drugs”), and compared with TxGNN and RotatE using the same number of top-ranked drugs based on their respective drug score rankings. DR3 and DR5 were excluded from this comparison because they generate fixed final drug lists, making them unsuitable for step-by-step filtering comparisons with ESCARGOT. The results show that ESCARGOT consistently outperforms the other methods in both precision and recall. Notably, RotatE also performs slightly better than TxGNN, suggesting that incorporating a disease-specific knowledge graph can potentially enhance drug repurposing performance even with a relatively lightweight ML approach like RotatE. While precision and recall help assess the quality of retrieved results, the main goal of this study is to evaluate ESCARGOT as an efficient tool for generating a manageable list of highly relevant candidate drugs for Alzheimer’s disease. These candidates can be passed on to downstream screening. Therefore, the emphasis is not on maximizing recall by retrieving all known drugs, but rather on producing a concise, high-precision list that includes both previously reported drugs and promising novel candidates.
From a computational cost perspective in the drug discovery process, the approximate runtimes for each method vary depending on the underlying resources and implementation. TxGNN, trained on an NVIDIA Tesla V100 GPU, did not report exact training time in its original publication but is expected to be computationally intensive due to its deep learning architecture. RotatE, trained on an NVIDIA RTX A6000 GPU, completed training in approximately 2.5 hours. ESCARGOT, tested on a MacBook without GPU acceleration, required an average of 3 minutes per drug repurposing path. The AlzKB-based chatbot completed each query in under two minutes. The LLM-based DR5 approach is expected to be comparable in speed to ESCARGOT and the chatbot. Once the machine learning models (DR1 and DR2) are trained, or the LLM-based solutions (DR3 to DR5) are set up, the actual drug discovery process across all methods becomes rapid and computationally efficient.
Overall, this comparison highlights the effectiveness of AlzKB as a disease-specific knowledge base, outperforming the general DR knowledge base TxGNN in ML-driven DR. Moreover, LLM-based approaches demonstrated superior ability to recover candidate drugs from previously published studies, as evidenced by generally higher overlap coefficients. ESCARGOT further enhanced LLM query performance, significantly improving DR outcomes.
Rediscovery of candidate repurposable drugs
We found that ESCARGOT on AlzKB conformed with previous DR studies in re-discovering several candidate repurposable drugs whose promising therapeutic effects have led to extensive validation experiments in vivo, such as animal models, pre-clinical studies, and clinical trials.
First and foremost, ESCARGOT on AlzKB consistently prioritized donepezil, an acetylcholinesterase inhibitor. Donepezil selectively and reversibly prevents the enzyme acetylcholinesterase from breaking down acetylcholine, a neurotransmitter chemical compound, and consequently, enhances synaptic communications in the brain.
Another rediscovered candidate drug was minocycline, an anti-inflammatory tetracycline antibiotic widely used to treat bacterial infection [10]. Minocycline shows a good tolerance profile, especially in older people, and as a lipophilic tetracycline, minocycline can better penetrate the blood-brain barrier. Various studies report that minocycline reduces A \(\beta\) accumulation in vitro [17, 19] and in vivo [11], reduces tau deposition [43], and improves cognitive behavior outcomes in transgenic mice models for AD [18, 55]. Clinical trials of minocycline have yet to prove its therapeutic benefits for early AD and other neurodegenerative diseases [24, 42, 63]. Overall, as no clinical trials report significant safety concerns, minocycline and other tetracycline are justifiable candidate drugs that warrant further investigation as treatments for AD.
Our approach also identified several candidate drugs belonging to the class of retinoids. Multiple preclinical studies point to retinoic acids’ promising therapeutic effects of reducing AD neuropathology, inflammation, A \(\beta\) plagues, and tau phosphorylation in transgenic AD mice models [4, 16, 27] but also neuroprotective effects for other neurodegenerative diseases [46, 64]. Clinical trial data remains limited, and the safety of long-term retinoid therapy requires further investigation [2].
ESCARGOT was the only DR approach that identified Tamoxifen, a selective estrogen receptor modulator widely used to treat and prevent breast cancers in women, as a candidate drug for AD. Tamoxifen targets tau pathology [14]. While tau protein and a normal level of phosphorylation maintain the structural stability of microtubules in neurons, hyperphosphorylation of tau leads to neurofibrillary tangles and eventually cell death [40]. Tamoxifen has been shown to exert a microtubule-stabilizing effect like the tau protein [33] and regulate tau phosphorylation by inhibiting CDK5 [9]. Though whether Tamoxifen can restore cognitive function and treat AD symptoms remains a question, epidemiology studies have shown long-term Tamoxifen’s preventive benefit against dementia and AD [7, 60]. Furthermore, as the only FDA-approved hormonal agent for the prevention or treatment of specific types of breast cancers, research has delineated how genetic factors influence the metabolism of tamoxifen into active, therapeutic forms [21] and how those genetic variations are distributed in various populations [32, 39, 65]. The comprehensive pharmacological and pharmacogenomic profiles of tamoxifen will better guide future clinical trials and improve the chance of success for repurposing tamoxifen as an AD treatment [13, 50].
Novel Drug Candidates
ESCARGOT on AlzKB also identified novel candidate repurposable drugs that have not been previously proposed for AD. These candidate drugs demonstrate significant potential, having met the same stringent filtering criteria as known AD drugs and other extensively studied repurposed candidates. Four such novel drugs are detected, with detailed information listed in Table 2 and Supple. Table2. Among them, three—Epirubicin, Vemurafenib, and Fulvestrant—are antineoplastic and immuno-modulating agents. Given that chronic neuroinflammation plays a crucial role in AD progression, immunomodulators, including certain cancer drugs, may help suppress excessive inflammation by targeting microglial activation and cytokine release [71].
Interestingly, the identification of Vitamin A as a potential AD drug is unexpected. While Vitamin A and its derivatives (e.g., retinoic acids) contribute to neurogenesis, synaptic plasticity, and amyloid-beta metabolism [30], their therapeutic potential in AD remains uncertain and underexplored compared to conventional drug targets. One study [8] suggests that, in APP/PS1 transgenic mice models, vitamin A helped reduce amyloid-beta accumulation, modulate neuroinflammation, and enhance cognitive function, which warrants investigation of vitamin A as a potential AD treatment.
Conclusions and Discussion
In this study, we explored the impact of KGs and LLMs on DR for AD using five distinct methods: TxGNN, an deep learning ML-based approach built on a general disease-oriented DR KG; RotatE, a lightweight ML approach utilizing AlzKB; an LLM-powered Cypher code generator chatbot for AlzKB; ESCARGOT, an LLM method leveraging the GoT strategy on AlzKB; and an iterative LLM query-based approach for identifying potential AD drugs. As shown in Table 2, performance comparison across all methods using the overlap coefficient reveals that ESCARGOT significantly outperforms the others, achieving values of 0.783 and 0.846. This represents 18% and 28% improvement over the second-best method, an iterative LLM querying approach, which achieved an overlap coefficient of 0.661. Additionally, when compared with TxGNN and RotatE, two machine learning methods also built on knowledge graphs, ESCARGOT demonstrates superior performance in both precision and recall (Fig. 3C).
KGs provide structured knowledge that can enhance ML algorithms for DR, as demonstrated in previous studies [1, 20, 25]. However, it remains an open question whether general-purpose KGs or disease-specific KGs yield better results. Our study directly addressed this by comparing ML approaches based on a general DR KG (TxGNN) vs. a disease-specific KG (AlzKB). The AlzKB-based approach outperformed its general counterpart, yielding a higher overlap coefficient with drugs from previously reported DR studies. This advantage likely arises from AlzKB’s specialized disease-relevant relationships and features, which improve computational efficiency and feature selection for AD-related drug predictions.
Our findings also highlight the superiority of LLM-based approaches over ML-based methods in DR, particularly in terms of overlap coefficient, usability, and reduced dependence on ML expertise. LLMs outperform ML models possibility due to their semantic awareness, flexible querying, and logical reasoning, whereas many ML algorithms, such as RotatE and TxGNN, rely heavily on graph structure embeddings, potentially missing implicit relationships crucial for DR.
Furthermore, we validated the advantage of GoT-enhanced LLMs for AlzKB. In our prior ESCARGOT study, GoT-based LLMs consistently outperformed baseline LLMs on various biomedical queries. Here, we extended that validation to DR and found that ESCARGOT significantly surpassed both Cypher-based LLM querying and iterative LLM prompting in overlap coefficient, stability, and logical reasoning. Its ability to handle complex logical operations makes it a powerful tool for refining candidate drug lists with precision.
By integrating LLMs and KGs, we established a robust framework for DR, demonstrating their synergistic potential in tackling complex biomedical challenges. Additionally, novel drug candidates (Epirubicin, Vemurafenib, Fulvestrant and Vitamin A) identified by our approach, which were absent in previous DR studies, further highlight its innovative contribution. These drugs have demonstrated significant relevance and hold promising potential as AD treatments, warranting further investigation through laboratory testing.
One potential limitation of our study is the reference dataset used for overlap coefficient calculations, which includes DR studies conducted between 2012 and 2022. While our dataset (AlzKB, published in early 2023) falls within a similar timeframe, the reference set does not account for the most recent advancements. However, given that our primary objective was to evaluate the efficiency and contributions of KGs and LLMs in DR, this minor gap should have minimal impact on our conclusions. Our results suggest that had our approach been used in place of previous decade-long DR efforts, it could have significantly reduced both cost and time in drug discovery.
Finally, our study raises new research questions. While AlzKB has proven effective for ML- and LLM-based DR, its performance with hybrid ML-LLM approaches—such as integrating ML-derived entity embeddings or weight scores into KG-driven LLM searches—remains unexplored. Additionally, AlzKB’s framework can be adapted to other diseases, and instructions for building disease-specific knowledge graphs are available in the AlzKB GitHub repository. Since ESCARGOT also integrates easily with various graph databases, our approach could be applied to other diseases for novel drug discovery, especially in drug repurposing. This potential warrants further investigation.
Data availability
No datasets were generated or analysed during the current study.
Abbreviations
- AD:
-
Alzheimer’s disease
- AI:
-
Artificial Intelligence
- AlzKB:
-
Alzheimer’s KnowledgeBase
- DR:
-
Drug repurposing
- ESCARGOT:
-
Enhanced Strategy and Cypher-driven Analysis and Reasoning using Graph-Of-Thoughts
- GoT:
-
Graph-of-Thoughts
- KG:
-
Knowledge graph
- LLM:
-
Large language model
- ML:
-
Machine Learning
References
Bang D, Lim S, Lee S, Kim S. Biomedical knowledge graph learning for drug repurposing by extending guilt-by-association to multiple layers. Nat Commun. 2023;14(1):3570. Publisher: Nature Publishing Group. https://doi.org/10.1038/s41467-023-39301-y.
Behl T, Kaur D, Sehgal A, Singla RK, Makeen HA, Albratty M, et al. Therapeutic insights elaborating the potential of retinoids in Alzheimers disease. 13:976799. https://doi.org/10.3389/fphar.2022.976799.
Besta M, Blach N, Kubicek A, Gerstenberger R, Podstawski M, Gianinazzi L, et al. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence. 2024;38(16):17682–17690. Number: 16. https://doi.org/10.1609/aaai.v38i16.29720.
Biyong EF, Tremblay C, Leclerc M, Caron V, Alfos S, Helbling JC, et al. Role of Retinoid X Receptors (RXRs) and dietary vitamin A in Alzheimer’s disease: Evidence from clinicopathological and preclinical studies. 161:105542. https://doi.org/10.1016/j.nbd.2021.105542.
Bonner S, Barrett IP, Ye C, Swiers R, Engkvist O, Hoyt CT, et al. Understanding the performance of knowledge graph embeddings in drug discovery. Artif Intell Life Sci. 2022;2:100036. https://doi.org/10.1016/j.ailsci.2022.100036.
Bonner S, Barrett IP, Ye C, Swiers R, Engkvist O, Bender A, et al. A review of biomedical datasets relating to drug discovery: a knowledge graph perspective. Brief Bioinform. 2022;23(6):bbac404. https://doi.org/10.1093/bib/bbac404.
Branigan GL, Soto M, Neumayer L, Rodgers K, Brinton RD. Association Between Hormone-Modulating Breast Cancer Therapies and Incidence of Neurodegenerative Outcomes for Women With Breast Cancer. 3(3):e201541. https://doi.org/10.1001/jamanetworkopen.2020.1541.
Chen BW, Zhang KW, Chen SJ, Yang C, Li PG. Vitamin A Deficiency Exacerbates Gut Microbiota Dysbiosis and Cognitive Deficits in Amyloid Precursor Protein/Presenilin 1 Transgenic Mice. Front Aging Neurosci. 2021;13. Publisher: Frontiers. https://doi.org/10.3389/fnagi.2021.753351.
Corbel C, Zhang B, Le Parc A, Baratte B, Colas P, Couturier C, et al. Tamoxifen inhibits CDK5 kinase activity by interacting with p35/p25 and modulates the pattern of tau phosphorylation. 22(4):472–482. https://doi.org/10.1016/j.chembiol.2015.03.009.
Corbett A, Pickett J, Burns A, Corcoran J, Dunnett SB, Edison P, et al. Drug repositioning for Alzheimer’s disease. 11(11):833–846. https://doi.org/10.1038/nrd3869.
Cuello AC, Ferretti MT, Leon WC, Iulita MF, Melis T, Ducatenzeiler A, et al. Early-stage inflammation and experimental therapy in transgenic models of the Alzheimer-like amyloid pathology. 7(1):96–98. https://doi.org/10.1159/000285514.
Cummings JL, Goldman DP, Simmons-Stern NR, Ponton E. The costs of developing treatments for Alzheimer’s disease: A retrospective exploration. Alzheimers Dement. 2022;18(3):469–77. https://doi.org/10.1002/alz.12450.
Cummings JL, Zhou Y, Van Stone A, Cammann D, Tonegawa-Kuji R, Fonseca J, et al. Drug repurposing for Alzheimer’s disease and other neurodegenerative disorders. 16(1):1755. https://doi.org/10.1038/s41467-025-56690-4.
Das V, Miller JH, Alladi CG, Annadurai N, De Sanctis JB, Hrub L, et al. Antineoplastics for treating Alzheimer’s disease and dementia: Evidence from preclinical and observational studies. 44(5):2078–2111. https://doi.org/10.1002/med.22033.
De Bastiani MA, Pfaffenseller B, Klamt F. Master Regulators Connectivity Map: A Transcription Factors-Centered Approach to Drug Repositioning. Front Pharmacol. 2018;9. Publisher: Frontiers. https://doi.org/10.3389/fphar.2018.00697.
Ding Y, Qiao A, Wang Z, Goodwin JS, Lee ES, Block ML, et al. Retinoic acid attenuates beta-amyloid deposition and rescues memory deficits in an Alzheimer’s disease transgenic mouse model. 28(45):11622–11634. https://doi.org/10.1523/JNEUROSCI.3153-08.2008.
Familian A, Boshuizen RS, Eikelenboom P, Veerhuis R. Inhibitory effect of minocycline on amyloid beta fibril formation and human microglial activation. 53(3):233–240. https://doi.org/10.1002/glia.20268.
Fan R, Xu F, Previti ML, Davis J, Grande AM, Robinson JK, et al. Minocycline reduces microglial activation and improves behavioral deficits in a transgenic model of cerebral microvascular amyloid. 27(12):3057–3063. https://doi.org/10.1523/JNEUROSCI.4371-06.2007.
Forloni G, Colombo L, Girola L, Tagliavini F, Salmona M. Anti-amyloidogenic activity of tetracyclines: studies in vitro. 487(3):404–407. https://doi.org/10.1016/s0014-5793(00)02380-2.
Gao Z, Ding P, Xu R. KG-Predict: A knowledge graph computational framework for drug repurposing. J Biomed Inform. 2022;132:104133. https://doi.org/10.1016/j.jbi.2022.104133.
Goetz MP, Sangkuhl K, Guchelaar HJ, Schwab M, Province M, Whirl-Carrillo M, et al. Clinical Pharmacogenetics Implementation Consortium (CPIC) Guideline for CYP2D6 and Tamoxifen Therapy. 103(5):770–777. https://doi.org/10.1002/cpt.1007.
Grabowska ME, Huang A, Wen Z, Li B, Wei WQ. Drug repurposing for Alzheimers disease from 20122022a 10-year literature review. Front Pharmacol. 2023;14. Publisher: Frontiers. https://doi.org/10.3389/fphar.2023.1257700.
Himmelstein DS, Lizee A, Hessler C, Brueggeman L, Chen SL, Hadley D, et al. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. eLife. 2017;6:e26726. Publisher: eLife Sciences Publications, Ltd. https://doi.org/10.7554/eLife.26726.
Howard R, Zubko O, Bradley R, Harper E, Pank L, O’Brien J, et al. Minocycline at 2 Different Dosages vs Placebo for Patients With Mild Alzheimer Disease: A Randomized Clinical Trial. 77(2):164–174. https://doi.org/10.1001/jamaneurol.2019.3762.
Huang K, Chandak P, Wang Q, Havaldar S, Vaid A, Leskovec J, et al. A foundation model for clinician-centered drug repurposing. Nat Med. 2024;30(12):3601–3613. Publisher: Nature Publishing Group. https://doi.org/10.1038/s41591-024-03233-x.
Jarada TN, Rokne JG, Alhajj R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J Cheminformatics. 2020;12(1):46. https://doi.org/10.1186/s13321-020-00450-7.
Jarvis CI, Goncalves MB, Clarke E, Dogruel M, Kalindjian SB, Thomas SA, et al. Retinoic acid receptor-\(\alpha\) signalling antagonizes both intracellular and extracellular amyloid- production and prevents neuronal cell death caused by amyloid-. 32(8):1246–1255. https://doi.org/10.1111/j.1460-9568.2010.07426.x.
Jin B, Xie C, Zhang J, Roy KK, Zhang Y, Li Z, et al. Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs. In: Ku LW, Martins A, Srikumar V, editors. Findings of the Association for Computational Linguistics: ACL 2024. Bangkok: Association for Computational Linguistics; 2024. pp. 163–184. https://aclanthology.org/2024.findings-acl.11/.
Lavecchia A. Deep learning in drug discovery: opportunities, challenges and future prospects. Drug Discov Today. 2019;24(10):2017–32. https://doi.org/10.1016/j.drudis.2019.07.006.
Lenz M, Kruse P, Eichler A, Straehle J, Beck J, Deller T, et al. All-trans retinoic acid induces synaptic plasticity in human cortical neurons. eLife. 2021;10:e63026. Publisher: eLife Sciences Publications, Ltd. https://doi.org/10.7554/eLife.63026.
Liang Y, Tan K, Xie T, Tao W, Wang S, Lan Y, et al. Aligning Large Language Models to a Domain-specific Graph Database for NL2GQL. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. CIKM ’24. New York: Association for Computing Machinery; 2024. pp. 1367–1377. https://doi.org/10.1145/3627673.3679713.
Li B, Sangkuhl K, Whaley R, Woon M, Keat K, Whirl-Carrillo M, et al. Frequencies of pharmacogenomic alleles across biogeographic groups in a large-scale biobank. 110(10):1628–1647. https://doi.org/10.1016/j.ajhg.2023.09.001.
Lo YC, Cormier O, Liu T, Nettles KW, Katzenellenbogen JA, Stearns T, et al. Pocket similarity identifies selective estrogen receptor modulators as microtubule modulators at the taxane site. 10(1):1033. https://doi.org/10.1038/s41467-019-08965-w.
Lorente JS, Sokolov AV, Ferguson G, Schiöth HB, Hauser AS, Gloriam DE. GPCR drug discovery: new agents, targets and indications. Nat Rev Drug Discov. 2025;24(6):458–479. Publisher: Nature Publishing Group. https://doi.org/10.1038/s41573-025-01139-y.
Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022;23(6):bbac409. https://doi.org/10.1093/bib/bbac409.
Masumshah R, Eslahchi C. DPSP: a multimodal deep learning framework for polypharmacy side effects prediction. Bioinforma Adv. 2023;3(1):vbad110. https://doi.org/10.1093/bioadv/vbad110.
Matsumoto N, Choi H, Moran J, Hernandez ME, Venkatesan M, Li X, et al. ESCARGOT: an AI agent leveraging large language models, dynamic graph of thoughts, and biomedical knowledge graphs for enhanced reasoning. Bioinformatics. 2025;41(2):btaf031. https://doi.org/10.1093/bioinformatics/btaf031.
Matsumoto N, Moran J, Choi H, Hernandez ME, Venkatesan M, Wang P, et al. KRAGEN: a knowledge graph-enhanced RAG framework for biomedical problem solving using large language models. Bioinformatics. 2024;40(6):btae353. https://doi.org/10.1093/bioinformatics/btae353.
McInnes G, Lavertu A, Sangkuhl K, Klein TE, Whirl-Carrillo M, Altman RB. Pharmacogenetics at Scale: An Analysis of the UK Biobank. 109(6):1528–1537. https://doi.org/10.1002/cpt.2122.
Medeiros R, BagliettoVargas D, LaFerla FM. The Role of Tau in Alzheimer’s Disease and Related Disorders. 17(5):514–524. https://doi.org/10.1111/j.1755-5949.2010.00177.x.
Munir S, Aldini A.: Towards Evaluating Large Language Models for Graph Query Generation. arXiv:2411.08449.
NINDS NET-PD Investigators. A pilot clinical trial of creatine and minocycline in early Parkinson disease: 18-month results. 31(3):141–150. https://doi.org/10.1097/WNF.0b013e3181342f32.
Noble W, Garwood C, Stephenson J, Kinsey AM, Hanger DP, Anderton BH. Minocycline reduces the development of abnormal tau species in models of Alzheimer’s disease. 23(3):739–750. https://doi.org/10.1096/fj.08-113795.
Pal S, Bhattacharya M, Islam MA, Chakraborty C. ChatGPT or LLM in next-generation drug discovery and development: pharmaceutical and biotechnology companies can make use of the artificial intelligence-based device for a faster way of drug discovery and development. Int J Surg. 2023;109(12):4382. https://doi.org/10.1097/JS9.0000000000000719.
Pan S, Luo L, Wang Y, Chen C, Wang J, Wu X. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans Knowl Data Eng. 2024;36(7):3580–3599. Conference Name: IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2024.3352100.
Pareek A, Singhal R, Pareek A, Ghazi T, Kapoor DU, Ratan Y, et al. Retinoic acid in Parkinson’s disease: Molecular insights, therapeutic advances, and future prospects. 355:123010. https://doi.org/10.1016/j.lfs.2024.123010.
Perdomo-Quinteiro P, Belmonte-Hernndez A. Knowledge Graphs for drug repurposing: a review of databases and methods. Brief Bioinform. 2024;25(6):bbae461. https://doi.org/10.1093/bib/bbae461.
Pillaiyar T, Meenakshisundaram S, Manickam M, Sankaranarayanan M. A medicinal chemistry perspective of drug repositioning: Recent advances and challenges in drug discovery. Eur J Med Chem. 2020;195:112275. https://doi.org/10.1016/j.ejmech.2020.112275.
Pushpakom S, Iorio F, Eyers PA, Escott KJ, Hopper S, Wells A, et al. Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov. 2019;18(1):41–58. Publisher: Nature Publishing Group. https://doi.org/10.1038/nrd.2018.168.
Razuvayevskaya O, Lopez I, Dunham I, Ochoa D. Genetic factors associated with reasons for clinical trial stoppage. 56(9):1862–1867. https://doi.org/10.1038/s41588-024-01854-z.
Rodriguez S, Hug C, Todorov P, Moret N, Boswell SA, Evans K, et al. Machine learning identifies candidates for drug repurposing in Alzheimers disease. Nat Commun. 2021;12(1):1033. Publisher: Nature Publishing Group. https://doi.org/10.1038/s41467-021-21330-0.
Romano JD, Truong V, Kumar R, Venkatesan M, Graham BE, Hao Y, et al. The Alzheimers Knowledge Base: A Knowledge Graph for Alzheimer Disease Research. J Med Internet Res. 2024;26(1):e46777. https://doi.org/10.2196/46777.
Sahoo SS, Plasek JM, Xu H, Uzuner z, Cohen T, Yetisgen M, et al. Large language models for biomedicine: foundations, opportunities, challenges, and best practices. J Am Med Inform Assoc. 2024;31(9):2114–2124. https://doi.org/10.1093/jamia/ocae074.
Schlander M, Hernandez-Villafuerte K, Cheng CY, Mestre-Ferrandiz J, Baumann M. How Much Does It Cost to Research and Develop a New Drug? A Systematic Review and Assessment PharmacoEconomics. 2021;39(11):1243–69. https://doi.org/10.1007/s40273-021-01065-y.
Seabrook TJ, Jiang L, Maier M, Lemere CA. Minocycline affects microglia activation, Abeta deposition, and behavior in APP-tg mice. 53(7):776–782. https://doi.org/10.1002/glia.20338.
Sertkaya A, Beleche T, Jessup A, Sommers BD. Costs of Drug Development and Research and Development Intensity in the US, 2000–2018. JAMA Netw Open. 2024;7(6):e2415445. https://doi.org/10.1001/jamanetworkopen.2024.15445.
Shao S, Henrique Ribeiro P, Ramirez CM, Moore JH. A review of feature selection strategies utilizing graph data structures and Knowledge Graphs. Brief Bioinform. 2024;25(6):bbae521. https://doi.org/10.1093/bib/bbae521.
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172–180. Publisher: Nature Publishing Group. https://doi.org/10.1038/s41586-023-06291-2.
Soman K, Rose PW, Morris JH, Akbas RE, Smith B, Peetoom B, et al. Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics. 2024;40(9):btae560. https://doi.org/10.1093/bioinformatics/btae560.
Sun LM, Chen HJ, Liang JA, Kao CH. Long-term use of tamoxifen reduces the risk of dementia: a nationwide population-based cohort study. 109(2):103–109. https://doi.org/10.1093/qjmed/hcv072.
Sun Z, Deng ZH, Nie JY, Tang J.: RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. arXiv:1902.10197.
Urbina F, Puhl AC, Ekins S. Recent advances in drug repurposing using machine learning. Curr Opin Chem Biol. 2021;65:74–84. https://doi.org/10.1016/j.cbpa.2021.06.001.
Van Eldik LJ, Carrillo MC, Cole PE, Feuerbach D, Greenberg BD, Hendrix JA, et al. The roles of inflammation and immune mechanisms inAlzheimer’sdisease. 2(2):99–109. https://doi.org/10.1016/j.trci.2016.05.001.
Vassal M, Martins F, Monteiro B, Tambaro S, Martinez-Murillo R, Rebelo S. Emerging Pro-neurogenic Therapeutic Strategies for Neurodegenerative Diseases: A Review of Pre-clinical and Clinical Research. 62(1):46–76. https://doi.org/10.1007/s12035-024-04246-w.
Verma SS, Keat K, Li B, Hoffecker G, Risman M, Regeneron Genetics Center, et al. Evaluating the frequency and the impact of pharmacogenetic alleles in an ancestrally diverse Biobank population. 20(1):550. https://doi.org/10.1186/s12967-022-03745-5.
Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Adv Neural Inf Process Syst. 2022;35:24824–37.
Yan C, Grabowska ME, Dickson AL, Li B, Wen Z, Roden DM, et al. Leveraging generative AI to prioritize drug repurposing candidates for Alzheimers disease with real-world clinical validation. NPJ Digit Med. 2024;7(1):1–6. Publisher: Nature Publishing Group. https://doi.org/10.1038/s41746-024-01038-3.
Yao S, Yu D, Zhao J, Shafran I, Griffiths T, Cao Y, et al. Tree of thoughts: deliberate problem solving with large language models. Adv Neural Inf Process Syst. 2023;36:11809–22.
Zhang R, Hristovski D, Schutte D, Kastrin A, Fiszman M, Kilicoglu H. Drug repurposing for COVID-19 via knowledge graph completion. J Biomed Inform. 2021;115:103696. https://doi.org/10.1016/j.jbi.2021.103696.
Zhang J, Zhang Y, Wang J, Xia Y, Zhang J, Chen L. Recent advances in Alzheimers disease: mechanisms, clinical trials and new drug development strategies. Signal Transduct Target Ther. 2024;9:211. https://doi.org/10.1038/s41392-024-01911-3.
Zhang Q, Yang G, Luo Y, Jiang L, Chi H, Tian G. Neuroinflammation in Alzheimers disease: insights from peripheral immune cells. Immun Ageing. 2024;21(1):38. https://doi.org/10.1186/s12979-024-00445-0.
Zhang S, Fan R, Liu Y, Chen S, Liu Q, Zeng W. Applications of transformer-based language models in bioinformatics: a survey. Bioinforma Adv. 2023;3(1):vbad001. https://doi.org/10.1093/bioadv/vbad001.
Zhou Z, Liao Q, Wei J, Zhuo L, Wu X, Fu X, et al. Revisiting drug–protein interaction prediction: a novel global–local perspective. Bioinformatics. 2024;40(5):btae271. https://doi.org/10.1093/bioinformatics/btae271.
Zhou Z, Wei J, Liu M, Zhuo L, Fu X, Zou Q. AnomalGRN: deciphering single-cell gene regulation network with graph anomaly detection. BMC Biol. 2025;23(1):73. https://doi.org/10.1186/s12915-025-02177-z.
Acknowledgements
The authors would like to thank Cedars Sinai Medical Center for providing computing resources and funding support. We also appreciate the valuable feedback provided by anonymous reviewers during the peer review process.
Funding
This work is supported in part by funds from the Center for AI Research and Education at Cedars-Sinai Medical Center and grants from the National Institutes of Health USA (U01 AG066833 and R01 LM010098).
Author information
Authors and Affiliations
Contributions
Z.P.W., B.L. wrote the main manuscript text. X.L., M.V., J.H.C, Y.M. prepared Figures and tables. N.M., J.M., H.C., M.E.H. developed LLM solutions, J.H.M. provided scientific instructions and support. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This research used publicly available datasets and did not require additional ethical approval or consent to participate.
Consent for publication
All authors have reviewed the final version of the manuscript and have given their full consent for its publication.
Competing interests
Dr. Jason H. Moore is the Editor-in-Chief of Biodata Mining and a co-editor of the topical collection “Advances in Data Mining for Biomedical Informatics and Healthcare”.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, Z.P., Li, X., Matsumoto, N. et al. Drug repurposing for Alzheimer’s disease using a graph-of-thoughts based large language model to infer drug-disease relationships in a comprehensive knowledge graph. BioData Mining 18, 51 (2025). https://doi.org/10.1186/s13040-025-00466-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13040-025-00466-5