Perspective
Published: 22 August 2025

Large language models for clinical decision support in gastroenterology and hepatology

Nature Reviews Gastroenterology & Hepatology (2025)Cite this article

878 Accesses
4 Altmetric
Metrics details

Subjects

Abstract

Clinical decision making in gastroenterology and hepatology has become increasingly complex and challenging for physicians. This growing complexity can be addressed by computational tools that support clinical decisions. Although numerous clinical decision support systems (CDSS) have emerged, they have faced difficulties with real-world performance and generalizability, resulting in limited clinical adoption. Generative artificial intelligence (AI), particularly large language models (LLMs), are introducing new possibilities for CDSS by offering more flexible and adaptable support that better reflects complex clinical scenarios. LLMs can process unstructured text, including patient data and medical guidelines, and integrate various information sources with high accuracy, especially when augmented with retrieval-augmented generation. Thus, LLMs can provide dynamic, context-specific support by generating personalized treatment recommendations, identifying potential complications based on patient history, and enabling natural language interactions with health-care providers. However, important challenges persist, particularly regarding biases, hallucinations, interoperability barriers, and proper training of health-care providers. We examine the parallel evolution of the complexity in clinical management in gastroenterology and hepatology, and the technical developments leading to current generative AI models. We discuss how these advances are converging to create effective CDSS, providing a conceptual basis for further development and clinical adoption of these systems.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Evolution of clinical decision support systems and cases of use in gastroenterology and hepatology.**

**Fig. 2: Stages of clinical decision making and opportunities for LLM-based decision support.**

**Fig. 3: Successful implementation of LLMs in clinical decision support systems.**

Evaluation and mitigation of the limitations of large language models in clinical decision-making

Article Open access 04 July 2024

Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework

Article Open access 23 April 2024

Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI

Article Open access 29 March 2024

References

Densen, P. Challenges and opportunities facing medical education. Trans. Am. Clin. Climatol. Assoc. 122, 48–58 (2011).
PubMed PubMed Central Google Scholar
Morris, Z. S., Wooding, S. & Grant, J. The answer is 17 years, what is the question: understanding time lags in translational research. J. R. Soc. Med. 104, 510–520 (2011).
Article PubMed PubMed Central Google Scholar��
Porter, J., Boyd, C., Skandari, M. R. & Laiteerapong, N. Revisiting the time needed to provide adult primary care. J. Gen. Intern. Med. 38, 147–155 (2023).
Article PubMed Google Scholar
Macaron, M. M. et al. A systematic review and meta analysis on burnout in physicians during the COVID-19 pandemic: a hidden healthcare crisis. Front. Psychiatry 13, 1071397 (2022).
Article PubMed Google Scholar
Ferrucci, L. & Kohanski, R. Better care for older patients with complex multimorbidity and frailty: a call to action. Lancet Healthy Longev. 3, e581–e583 (2022).
Article PubMed Google Scholar
Osheroff, J. A. et al. Improving Outcomes with Clinical Decision Support: An Implementer’s Guide (HIMSS, 2012).
Osheroff, J. A. et al. A roadmap for national action on clinical decision support. J. Am. Med. Inform. Assoc. 14, 141–145 (2007).
Article PubMed PubMed Central Google Scholar
Moxey, A. et al. Computerized clinical decision support for prescribing: provision does not guarantee uptake. J. Am. Med. Inform. Assoc. 17, 25–33 (2010).
Article PubMed PubMed Central Google Scholar
Kortteisto, T., Komulainen, J., Mäkelä, M., Kunnamo, I. & Kaila, M. Clinical decision support must be useful, functional is not enough: a qualitative study of computer-based clinical decision support in primary care. BMC Health Serv. Res. 12, 349 (2012).
Article PubMed PubMed Central Google Scholar
Patterson, E. S. et al. Identifying barriers to the effective use of clinical reminders: bootstrapping multiple methods. J. Biomed. Inform. 38, 189–199 (2005).
Article PubMed Google Scholar
Liberati, E. G. et al. What hinders the uptake of computerized decision support systems in hospitals? A qualitative study and framework for implementation. Implement. Sci. 12, 113 (2017).
Article PubMed PubMed Central Google Scholar
de Dombal, F. T. Computers, diagnoses and patients with acute abdominal pain. Arch. Emerg. Med. 9, 267–270 (1992).
Article PubMed PubMed Central Google Scholar
Sutton, R. T. et al. An overview of clinical decision support systems: benefits, risks, and strategies for success. npj Digit. Med. 3, 17 (2020).
Article PubMed PubMed Central Google Scholar
Dang, A. Real-world evidence: a primer. Pharm. Med. 37, 25–36 (2023).
Article Google Scholar
Zhang, K. et al. A generalist vision-language foundation model for diverse biomedical tasks. Nat. Med. 30, 3129–3141 (2024).
Article CAS PubMed Google Scholar
Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature https://doi.org/10.1038/s41586-024-07618-3 (2024).
Article PubMed PubMed Central Google Scholar
Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.12712 (2023).
Gemini Team Google. Gemini 1.5: unlocking multimodal understanding across millions of tokens of context. Preprint at arXiv https://doi.org/10.48550/arXiv.2403.05530 (2024).
Blease, C. R., Locher, C., Gaab, J., Hägglund, M. & Mandl, K. D. Generative artificial intelligence in primary care: an online survey of UK general practitioners. BMJ Health Care Inf. 31, e101102 (2024).
Article Google Scholar
Laohawetwanit, T., Pinto, D. G. & Bychkov, A. A survey analysis of the adoption of large language models among pathologists. Am. J. Clin. Pathol. https://doi.org/10.1093/ajcp/aqae093 (2024).
Article PubMed Google Scholar
Spotnitz, M. et al. A survey of clinicians’ views of the utility of large language models. Appl. Clin. Inform. 15, 306–312 (2024).
Article PubMed PubMed Central Google Scholar
Ferber, D. et al. GPT-4 for information retrieval and comparison of medical oncology guidelines. NEJM AI 1, AIcs2300235 (2024).
Article Google Scholar
Wiest, I. C. et al. Privacy-preserving large language models for structured medical information retrieval. npj Digit. Med. 7, 257 (2024).
Article PubMed PubMed Central Google Scholar
Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. https://doi.org/10.1038/s41591-024-02855-5 (2024).
Article PubMed PubMed Central Google Scholar
Wornow, M. et al. Zero-shot clinical trial patient matching with LLMs. NEJM AI 2, AIcs2400360 (2025).
Article Google Scholar
Weissman, G. E., Mankowitz, T. & Kanter, G. P. Unregulated large language models produce medical device-like output. npj Digit. Med. 8, 148 (2025).
Article PubMed PubMed Central Google Scholar
US Department of Health and Human Services. Artificial intelligence-enabled device software functions: lifecycle management and marketing submission recommendations. Draft guidance for industry and Food and Drug Administration staff. FDA www.fda.gov/media/184856/download (2025).
EUR-Lex. Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC. EUR-Lex eur-lex.europa.eu/eli/reg/2017/745/oj/eng (2025).
Vieujean, S. et al. Understanding the therapeutic toolkit for inflammatory bowel disease. Nat. Rev. Gastroenterol. Hepatol. https://doi.org/10.1038/s41575-024-01035-7 (2025).
Article PubMed Google Scholar
Colombel, J.-F., Narula, N. & Peyrin-Biroulet, L. Management strategies to improve outcomes of patients with inflammatory bowel diseases. Gastroenterology 152, 351–361.e5 (2017).
Article PubMed Google Scholar
Bruner, L. P., White, A. M. & Proksell, S. Inflammatory bowel disease. Prim. Care 50, 411–427 (2023).
Article PubMed Google Scholar
Rogler, G., Singh, A., Kavanaugh, A. & Rubin, D. T. Extraintestinal manifestations of inflammatory bowel disease: current concepts, treatment, and implications for disease management. Gastroenterology 161, 1118–1132 (2021).
Article CAS PubMed Google Scholar
Ui-Haq, Z. et al. Health-care resource use and costs associated with inflammatory bowel disease in northwest London: a retrospective linked database study. BMC Gastroenterol. 24, 480 (2024).
Article PubMed PubMed Central Google Scholar
Ewais, T. et al. A systematic review and meta-analysis of mindfulness based interventions and yoga in inflammatory bowel disease. J. Psychosom. Res. 116, 44–53 (2019).
Article PubMed Google Scholar
Allen, A. M., Younossi, Z. M., Diehl, A. M., Charlton, M. R. & Lazarus, J. V. Envisioning how to advance the MASH field. Nat. Rev. Gastroenterol. Hepatol. 21, 726–738 (2024).
Article PubMed Google Scholar
Zelber-Sagi, S. et al. Food inequity and insecurity and MASLD: burden, challenges, and interventions. Nat. Rev. Gastroenterol. Hepatol. 21, 668–686 (2024).
Article PubMed Google Scholar
Younossi, Z. M. et al. The global epidemiology of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH): a systematic review. Hepatology 77, 1335–1347 (2023).
Article PubMed Google Scholar
Riazi, K. et al. The prevalence and incidence of NAFLD worldwide: a systematic review and meta-analysis. Lancet Gastroenterol. Hepatol. 7, 851–861 (2022).
Article CAS PubMed Google Scholar
Yang, Z. et al. Global burden of metabolic dysfunction-associated steatotic liver disease attributable to high fasting plasma glucose in 204 countries and territories from 1990 to 2021. Sci. Rep. 14, 22232 (2024).
Article CAS PubMed PubMed Central Google Scholar
Stefan, N., Yki-Järvinen, H. & Neuschwander-Tetri, B. A. Metabolic dysfunction-associated steatotic liver disease: heterogeneous pathomechanisms and effectiveness of metabolism-based treatment. Lancet Diabetes Endocrinol. 13, 134–148 (2025).
Article CAS PubMed Google Scholar
Rinella, M. E. et al. A multisociety Delphi consensus statement on new fatty liver disease nomenclature. Hepatology 78, 1966–1986 (2023).
Article PubMed Google Scholar
Soto-Catalán, M. et al. Semaglutide improves liver steatosis and de novo lipogenesis markers in obese and type-2-diabetic mice with metabolic-dysfunction-associated steatotic liver disease. Int. J. Mol. Sci. 25, 2961 (2024).
Article PubMed PubMed Central Google Scholar
Wattacheril, J. J. The role of noninvasive biomarkers: evaluation and management of MASLD. GI & Hepatology News (5 June 2024).
Schneider, C. V. et al. Large-scale identification of undiagnosed hepatic steatosis using natural language processing. EClinicalMedicine 62, 102149 (2023).
Article PubMed PubMed Central Google Scholar
Eskridge, W. et al. Metabolic dysfunction-associated steatotic liver disease and metabolic dysfunction-associated steatohepatitis: the patient and physician perspective. J. Clin. Med. 12, 6216 (2023).
Article PubMed PubMed Central Google Scholar
Konyn, P., Ahmed, A. & Kim, D. Current epidemiology in hepatocellular carcinoma. Expert. Rev. Gastroenterol. Hepatol. 15, 1295–1307 (2021).
Article CAS PubMed Google Scholar
Samant, H., Amiri, H. S. & Zibari, G. B. Addressing the worldwide hepatocellular carcinoma: epidemiology, prevention and management. J. Gastrointest. Oncol. 12, S361–S373 (2021).
Article PubMed PubMed Central Google Scholar
European Association for the Study of the Liver. EASL clinical practice guidelines on the management of hepatocellular carcinoma. J. Hepatol. 82, 315–374 (2025).
Article Google Scholar
Reig, M. et al. BCLC strategy for prognosis prediction and treatment recommendation: the 2022 update. J. Hepatol. 76, 681–693 (2022).
Article PubMed Google Scholar
Yip, T. C.-F. & Wong, G. L.-H. Transforming the landscape of liver cancer detection and care. Nat. Rev. Gastroenterol. Hepatol. https://doi.org/10.1038/s41575-024-01018-8 (2024).
Article Google Scholar
Alawyia, B. & Constantinou, C. Hepatocellular carcinoma: a narrative review on current knowledge and future prospects. Curr. Treat. Options Oncol. 24, 711–724 (2023).
Article PubMed Google Scholar
Cherradi, S. et al. Modelling hepatocellular carcinoma microenvironment phenotype to evaluate drug efficacy. Sci. Rep. 15, 1179 (2025).
Article CAS PubMed PubMed Central Google Scholar
Jing, F., Li, X., Jiang, H., Sun, J. & Guo, Q. Combating drug resistance in hepatocellular carcinoma: no awareness today, no action tomorrow. Biomed. Pharmacother. 167, 115561 (2023).
Article CAS PubMed Google Scholar
Ducreux, M. et al. The management of hepatocellular carcinoma. Current expert opinion and recommendations derived from the 24th ESMO/World Congress on Gastrointestinal Cancer, Barcelona, 2022. ESMO Open. 8, 101567 (2023).
Article CAS PubMed PubMed Central Google Scholar
Clusmann, J. et al. Machine learning predicts liver cancer risk from routine clinical data: a large population-based multicentric study. Preprint at medRxiv https://doi.org/10.1101/2024.11.03.24316662 (2024).
Sung, H. et al. Colorectal cancer incidence trends in younger versus older adults: an analysis of population-based cancer registry data. Lancet Oncol. 26, 51–63 (2025).
Article PubMed PubMed Central Google Scholar
US National Library of Medicine. ClinicalTrials.gov clinicaltrials.gov/study/NCT05080673?cond=colonoscopy%20screening%20intervals&intr=NCT05080673&rank=1 (2024).
Kuipers, E. J. & Spaander, M. C. Personalized screening for colorectal cancer. Nat. Rev. Gastroenterol. Hepatol. 15, 391–392 (2018).
Article CAS PubMed Google Scholar
Kuipers, E. J. et al. Colorectal cancer. Nat. Rev. Dis. Primers 1, 15065 (2015).
Article PubMed PubMed Central Google Scholar
Kahn, C. L. et al. Circulating tumor DNA in addition to fecal immunochemical test in a dual-test colorectal cancer screening approach. Clin. Colorectal Cancer 24, 310–319.e1 (2025).
PubMed Google Scholar
Brenne, S. S. et al. Colorectal cancer detected by liquid biopsy 2 years prior to clinical diagnosis in the HUNT study. Br. J. Cancer 129, 861–868 (2023).
Article CAS PubMed PubMed Central Google Scholar
Mateo, J. et al. A framework to rank genomic alterations as targets for cancer precision medicine: the ESMO scale for clinical actionability of molecular targets (ESCAT). Ann. Oncol. 29, 1895–1902 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kasi, P. M. et al. BESPOKE study protocol: a multicentre, prospective observational study to evaluate the impact of circulating tumour DNA guided therapy on patients with colorectal cancer. BMJ Open. 11, e047831 (2021).
Article PubMed PubMed Central Google Scholar
Kasi, P. M. et al. Circulating tumor DNA (ctDNA) for informing adjuvant chemotherapy (ACT) in stage II/III colorectal cancer (CRC): interim analysis of BESPOKE CRC study [abstract]. J. Clin. Oncol. 42, 9 (2024).
Article Google Scholar
Tie, J. et al. Circulating tumor DNA analysis guiding adjuvant therapy in stage II colon cancer. N. Engl. J. Med. 386, 2261–2272 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yukami, H. et al. Circulating tumor DNA (ctDNA) dynamics in patients with colorectal cancer (CRC) with molecular residual disease: updated analysis from GALAXY study in the CIRCULATE-JAPAN [abstract]. J. Clin. Oncol. 42, 6 (2024).
Article Google Scholar
Kramer, A. et al. Early evaluation of the effectiveness and cost-effectiveness of ctDNA-guided selection for adjuvant chemotherapy in stage II colon cancer. Ther. Adv. Med. Oncol. 16, 17588359241266164 (2024).
Article CAS PubMed PubMed Central Google Scholar
Krell, M., Llera, B. & Brown, Z. J. Circulating tumor DNA and management of colorectal cancer. Cancers 16, 21 (2023).
Article PubMed PubMed Central Google Scholar
Li, W. et al. Analytical evaluation of circulating tumor DNA sequencing assays. Sci. Rep. 14, 4973 (2024).
Article CAS PubMed PubMed Central Google Scholar
Pascual, J. et al. ESMO recommendations on the use of circulating tumour DNA assays for patients with cancer: a report from the ESMO Precision Medicine Working Group. Ann. Oncol. 33, 750–768 (2022).
Article CAS PubMed Google Scholar
Stetson, D. et al. Next-generation molecular residual disease assays: do we have the tools to evaluate them properly? J. Clin. Oncol. 42, 2736–2740 (2024).
Article PubMed Google Scholar
Denlinger, C. S. & Barsevick, A. M. The challenges of colorectal cancer survivorship. J. Natl Compr. Canc. Netw. 7, 883–893; quiz 894 (2009).
Article PubMed PubMed Central Google Scholar
Yu, M., Ren, L., Zheng, M., Hong, M. & Wei, Z. Delayed diagnosis of Wilson’s Disease report from 179 newly diagnosed cases in China. Front. Neurol. 13, 884840 (2022).
Article PubMed PubMed Central Google Scholar
Melas, N., Amin, R., Gyllemark, P., Younes, A. H. & Almer, S. Whipple’s disease: the great masquerader — a high level of suspicion is the key to diagnosis. BMC Gastroenterol. 21, 128 (2021).
Article PubMed PubMed Central Google Scholar
Ronicke, S. et al. Can a decision support system accelerate rare disease diagnosis? Evaluating the potential impact of Ada DX in a retrospective study. Orphanet J. Rare Dis. 14, 69 (2019).
Article PubMed PubMed Central Google Scholar
Schaaf, J., Sedlmayr, M., Sedlmayr, B. & Storf, H. User-centred development of a diagnosis support system for rare diseases. Stud. Health Technol. Inform. 293, 11–18 (2022).
PubMed Google Scholar
Yakubovich, K. Clinical decision-making theories. Ann. Innov. Med., https://doi.org/10.59652/aim.v1i2.51 (2023).
Article Google Scholar
Vorisek, C. N. et al. Fast healthcare interoperability resources (FHIR) for interoperability in health research: systematic review. JMIR Med. Inform. 10, e35724 (2022).
Article PubMed PubMed Central Google Scholar
Goodrum, H., Roberts, K. & Bernstam, E. V. Automatic classification of scanned electronic health record documents. Int. J. Med. Inform. 144, 104302 (2020).
Article PubMed PubMed Central Google Scholar
Poissant, L., Pereira, J., Tamblyn, R. & Kawasumi, Y. The impact of electronic health records on time efficiency of physicians and nurses: a systematic review. J. Am. Med. Inform. Assoc. 12, 505–516 (2005).
Article PubMed PubMed Central Google Scholar
Gold, R. et al. Using electronic health record-based clinical decision support to provide social risk-informed care in community health centers: protocol for the design and assessment of a clinical decision support tool. JMIR Res. Protoc. 10, e31733 (2021).
Article PubMed PubMed Central Google Scholar
Quan, P. L. et al. Usefulness of drug allergy alert systems: present and future. Curr. Treat. Options Allergy https://doi.org/10.1007/s40521-023-00351-8 (2023).
Article Google Scholar
Elkin, P. L. et al. The introduction of a diagnostic decision support system (DXplain^TM) into the workflow of a teaching hospital service can decrease the cost of service for diagnostically challenging diagnostic related groups (DRGs). Int. J. Med. Inform. 79, 772–777 (2010).
Article PubMed PubMed Central Google Scholar
Colabianchi, S., Costantino, F. & Sabetta, N. Assessment of a large language model based digital intelligent assistant in assembly manufacturing. Comput. Ind. 162, 104129 (2024).
Article Google Scholar
Wekenborg, M. K., Gilbert, S. & Kather, J. N. Examining human-AI interaction in real-world healthcare beyond the laboratory. npj Digit. Med. 8, 169 (2025).
Article PubMed PubMed Central Google Scholar
Vaccaro, M., Almaatouq, A. & Malone, T. When combinations of humans and AI are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 8, 2293–2303 (2024).
Article PubMed PubMed Central Google Scholar
Groopman, J. E. How Doctors Think (Houghton Mifflin, 2007).
Kawamoto, K., Houlihan, C. A., Balas, E. A. & Lobach, D. F. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ 330, 765 (2005).
Article PubMed PubMed Central Google Scholar
Webster, C. S., Taylor, S. & Weller, J. M. Cognitive biases in diagnosis and decision making during anaesthesia and intensive care. BJA Educ. 21, 420–425 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, Z. et al. Harnessing the power of clinical decision support systems: challenges and opportunities. Open. Heart 10, e002432 (2023).
Article PubMed PubMed Central Google Scholar
Miller, R. A., McNeil, M. A., Challinor, S. M., Masarie, F. E. Jr & Myers, J. D. The INTERNIST-1/QUICK MEDICAL REFERENCE project — status report. West. J. Med. 145, 816–822 (1986).
CAS PubMed PubMed Central Google Scholar
Shortliffe, E. H. Mycin: a knowledge-based computer program applied to infectious diseases. Proc. Annu. Symp. Comput. Appl. Med. Care 1977, 66–69 (1977).
Google Scholar
Wright, A. & Sittig, D. F. A four-phase model of the evolution of clinical decision support architectures. Int. J. Med. Inform. 77, 641–649 (2008).
Article PubMed PubMed Central Google Scholar
Darmoni, S. J. & Poynard, T. Computer-aided decision support in hepatology. Scand. J. Gastroenterol. 27, 889–896 (1992).
Article CAS PubMed Google Scholar
Bates, D. W. et al. The impact of computerized physician order entry on medication error prevention. J. Am. Med. Inform. Assoc. 6, 313–321 (1999).
Article CAS PubMed PubMed Central Google Scholar
Malchow-Møller, A., Bjerregaard, B. & Hilden, J. Computer-assisted diagnosis in gastroenterology. Scand. J. Gastroenterol. Suppl. 216, 225–233 (1996).
Article PubMed Google Scholar
Babic, A., Mathiesen, U., Hedin, K., Bodemar, G. & Wigertz, O. Assessing an AI knowledge-base for asymptomatic liver diseases. Proc. AMIA Symp. 1998, 513–517 (1998).
Google Scholar
Quinn, J. An HL7 (Health Level Seven) overview. J. AHIMA 70, 32–34 (1999). quiz 35–6.
CAS PubMed Google Scholar
Ferrucci, D. Introduction to “This is Watson”. IBM J. Res. Dev. 56, 235–249 (2012).
Article Google Scholar
Papadopoulos, P., Soflano, M., Chaudy, Y., Adejo, W. & Connolly, T. M. A systematic review of technologies and standards used in the development of rule-based clinical decision support systems. Health Technol. 12, 713–727 (2022).
Article Google Scholar
Dugas, M., Schauer, R., Volk, A. & Rau, H. Interactive decision support in hepatic surgery. BMC Med. Inform. Decis. Mak. 2, 5 (2002).
Article PubMed PubMed Central Google Scholar
Ash, J. S., Sittig, D. F., Campbell, E. M., Guappone, K. P. & Dykstra, R. H. Some unintended consequences of clinical decision support systems. AMIA Annu. Symp. Proc. 2007, 26–30 (2007).
PubMed PubMed Central Google Scholar
Castaneda, C. et al. Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J. Clin. Bioinforma. 5, 4 (2015).
Article PubMed PubMed Central Google Scholar
Eberhardt, J., Bilchik, A. & Stojadinovic, A. Clinical decision support systems: potential with pitfalls. J. Surg. Oncol. 105, 502–510 (2012).
Article PubMed Google Scholar
Trivedi, M. H. et al. Development and implementation of computerized clinical guidelines: barriers and solutions. Methods Inf. Med. 41, 435–442 (2002).
Article CAS PubMed Google Scholar
Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. AI in health and medicine. Nat. Med. 28, 31–38 (2022).
Article CAS PubMed Google Scholar
Gliadkovskaya, A. Some doctors are using public AI chatbots like ChatGPT in clinical decisions. Is it safe? FIERCE Healthcare www.fiercehealthcare.com/special-reports/some-doctors-are-using-public-generative-ai-tools-chatgpt-clinical-decisions-it (2024).
Tu, T. et al. Towards conversational diagnostic artificial intelligence. Nature 642, 442–450 (2025).
Article CAS PubMed PubMed Central Google Scholar
Rosen, J. M. Generative AI in pediatric gastroenterology. Curr. Gastroenterol. Rep. 26, 342–348 (2024).
Article PubMed Google Scholar
Truhn, D., Reis-Filho, J. S. & Kather, J. N. Large language models should be used as scientific reasoning engines, not knowledge databases. Nat. Med. https://doi.org/10.1038/s41591-023-02594-z (2023).
Article PubMed Google Scholar
Chui, M., Hazan, E., Roberts, R., Singla, A. & Smaje, K. The economic potential of generative AI: the next productivity frontier (McKinsey & Company, 2023).
Strachan, J. W. A. et al. Testing theory of mind in large language models and humans. Nat. Hum. Behav. 8, 1285–1295 (2024).
Article PubMed PubMed Central Google Scholar
Odabashian, R. et al. Assessment of ChatGPT-3f.5’s knowledge in oncology: comparative study with ASCO-SEP benchmarks. JMIR AI 3, e50442 (2024).
Article PubMed PubMed Central Google Scholar
Kaiser, K. N. et al. Use of large language models as clinical decision support tools for management pancreatic adenocarcinoma using National Comprehensive Cancer Network guidelines. Surgery 182, 109267 (2025).
Article PubMed Google Scholar
Zhou, S. et al. The performance of large language model-powered chatbots compared to oncology physicians on colorectal cancer queries. Int. J. Surg. 110, 6509–6517 (2024).
Article PubMed PubMed Central Google Scholar
Gong, E. J. et al. The potential clinical utility of the customized large language model in gastroenterology: a pilot study. Bioengineering 12, 1 (2024).
Article PubMed PubMed Central Google Scholar
Zhou, Q. et al. GastroBot: a Chinese gastrointestinal disease chatbot based on the retrieval-augmented generation. Front. Med. 11, 1392555 (2024).
Article Google Scholar
Kainz, J. et al. Fine-tuning an existing large language model with knowledge from the medical expert system Hepaxpert. Stud. Health Technol. Inform. 327, 143–147 (2025).
PubMed Google Scholar
Lim, D. Y. Z. et al. ChatGPT on guidelines: providing contextual knowledge to GPT allows it to provide advice on appropriate colonoscopy intervals. J. Gastroenterol. Hepatol. 39, 81–106 (2024).
Article PubMed Google Scholar
Gorelik, Y., Ghersin, I., Maza, I. & Klein, A. Harnessing language models for streamlined postcolonoscopy patient management: a novel approach. Gastrointest. Endosc. 98, 639–641.e4 (2023).
Article PubMed Google Scholar
Mukherjee, S. et al. Assessing ChatGPT’s ability to reply to queries regarding colon cancer screening based on multisociety guidelines. Gastro Hep Adv. 2, 1040–1043 (2023).
Article CAS PubMed PubMed Central Google Scholar
Patil, N. S., Huang, R. S., van der Pol, C. B. & Larocque, N. Using artificial intelligence chatbots as a radiologic decision-making tool for liver imaging: do ChatGPT and Bard communicate information consistent with the ACR appropriateness criteria? J. Am. Coll. Radiol. 20, 1010–1013 (2023).
Article PubMed Google Scholar
Pugliese, N. et al. Accuracy, reliability, and comprehensibility of ChatGPT-generated medical responses for patients with nonalcoholic fatty liver disease. Clin. Gastroenterol. Hepatol. 22, 886–889.e5 (2024).
Article PubMed Google Scholar
Endo, Y. et al. Quality of ChatGPT responses to questions related to liver transplantation. J. Gastrointest. Surg. 27, 1716–1719 (2023).
Article PubMed Google Scholar
Zheng, N. S. et al. Detection of gastrointestinal bleeding with large language models to aid quality improvement and appropriate reimbursement. Gastroenterology 168, 111–120.e4 (2025).
Article PubMed Google Scholar
Wiest, I. C. et al. Deep sight: enhancing periprocedural adverse event recording in endoscopy by structuring text documentation with privacy-preserving large language models. iGIE 3, 447–452.e5 (2024).
Article Google Scholar
Scherbakov, D. et al. Using large language models for extracting stressful life events to assess their impact on preventive colon cancer screening adherence. BMC Public. Health 25, 12 (2025).
Article PubMed PubMed Central Google Scholar
Gu, K. et al. Using GPT-4 for LI-RADS feature extraction and categorization with multilingual free-text reports. Liver Int. 44, 1578–1587 (2024).
Article PubMed Google Scholar
Matute-González, M. et al. Utilizing a domain-specific large language model for LI-RADS v2018 categorization of free-text MRI reports: a feasibility study. Insights Imaging 15, 280 (2024).
Article PubMed PubMed Central Google Scholar
Spitzl, D. et al. Leveraging large language models for accurate classification of liver lesions from MRI reports. Comput. Struct. Biotechnol. J. 27, 2139–2146 (2025).
Article PubMed PubMed Central Google Scholar
Kim, H. et al. Conversion of mixed-language free-text CT reports of pancreatic cancer to National Comprehensive Cancer Network structured reporting templates by using GPT-4. Korean J. Radiol. 26, 557–568 (2025).
Article PubMed PubMed Central Google Scholar
Wang, A. et al. Large language model answers medical questions about standard pathology reports. Front. Med. 11, 1402457 (2024).
Article Google Scholar
Pereyra, L., Schlottmann, F., Steinberg, L. & Lasa, J. Colorectal cancer prevention: is chat generative pretrained transformer (Chat GPT) ready to assist physicians in determining appropriate screening and surveillance recommendations? J. Clin. Gastroenterol. 58, 1022–1027 (2024).
Article PubMed Google Scholar
Chatziisaak, D. et al. Concordance of ChatGPT artificial intelligence decision-making in colorectal cancer multidisciplinary meetings: retrospective study. BJS Open. 9, zraf040 (2025).
Article PubMed PubMed Central Google Scholar
Cao, J. J. et al. Large language models’ responses to liver cancer surveillance, diagnosis, and management questions: accuracy, reliability, readability. Abdom. Radiol. 49, 4286–4294 (2024).
Article Google Scholar
Abou Chaar, M. K., Grigsby-Rocca, G., Huang, M. & Blackmon, S. H. ChatGPT vs expert-guided care pathways for postesophagectomy symptom management. Ann. Thorac. Surg. Short Rep. 2, 674–679 (2024).
Article PubMed PubMed Central Google Scholar
Huo, B. et al. Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD. Surg. Endosc. 38, 5668–5677 (2024).
Article PubMed Google Scholar
Ye, Y. et al. Comparative evaluation of the accuracy and reliability of ChatGPT versions in providing information on Helicobacter pylori infection. Front. Public. Health 13, 1566982 (2025).
Article PubMed PubMed Central Google Scholar
Malik, S. et al. Evaluating artificial intelligence-driven responses to acute liver failure queries: a comparative analysis across accuracy, clarity, and relevance. Am. J. Gastroenterol. https://doi.org/10.14309/ajg.0000000000003255 (2024).
Article PubMed Google Scholar
Ghersin, I. et al. Comparative evaluation of a language model and human specialists in the application of European guidelines for the management of inflammatory bowel diseases and malignancies. Endoscopy 56, 706–709 (2024).
Article PubMed Google Scholar
Samaan, J. S. et al. Examining the accuracy and reproducibility of responses to nutrition questions related to inflammatory bowel disease by generative pre-trained transformer-4. Crohns Colitis 360 7, otae077 (2025).
Article PubMed PubMed Central Google Scholar
Sato, M. et al. Efficacy of a large language model in classifying branch-duct intraductal papillary mucinous neoplasms. Abdom. Radiol. https://doi.org/10.1007/s00261-025-05062-z (2025).
Article Google Scholar
Gui, X. et al. Enhancing hepatopathy clinical trial efficiency: a secure, large language model-powered pre-screening pipeline. BioData Min. 18, 42 (2025).
Article PubMed PubMed Central Google Scholar
Kresevic, S. et al. Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework. npj Digit. Med. 7, 102 (2024).
Article PubMed PubMed Central Google Scholar
Zhu, M. et al. Large language model trained on clinical oncology data predicts cancer progression. npj Digit. Med. 8, 397 (2025).
Article PubMed PubMed Central Google Scholar
Lusetti, F. et al. Applications of generative artificial intelligence in inflammatory bowel disease: a systematic review. Dig. Liver Dis. https://doi.org/10.1016/j.dld.2025.04.026 (2025).
Article PubMed Google Scholar
Gravina, A. G. et al. May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients’ questions? An evidence-controlled analysis. World J. Gastroenterol. 30, 17–33 (2024).
Article PubMed PubMed Central Google Scholar
Yeo, Y. H. et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin. Mol. Hepatol. 29, 721–732 (2023).
Article PubMed PubMed Central Google Scholar
Lee, T.-C. et al. ChatGPT answers common patient questions about colonoscopy. Gastroenterology 165, 509–511.e7 (2023).
Article PubMed Google Scholar
Kral, J., Hradis, M., Buzga, M. & Kunovsky, L. Exploring the benefits and challenges of AI-driven large language models in gastroenterology: think out of the box. Biomed. Pap. Med. Fac. Univ. Palacky. Olomouc Czech. Repub. 168, 277–283 (2024).
Article PubMed Google Scholar
Gong, E. J. et al. Large language models in gastroenterology: systematic review. J. Med. Internet Res. 26, e66648 (2024).
Article PubMed PubMed Central Google Scholar
Yim, D., Khuntia, J., Parameswaran, V. & Meyers, A. Preliminary evidence of the use of generative AI in health care clinical services: systematic narrative review. JMIR Med. Inform. 12, e52073 (2024).
Article PubMed PubMed Central Google Scholar
Loh, E. ChatGPT and generative AI chatbots: challenges and opportunities for science, medicine and medical leaders. BMJ Lead. 8, 51–54 (2023).
Article Google Scholar
Ramjee, P. et al. CataractBot: an LLM-powered expert-in-the-loop chatbot for cataract patients. In Proc. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (eds Mamykina, L. & Ploetz, T) 45 (ACM, 2025).
Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med. 3, 141 (2023).
Article PubMed PubMed Central Google Scholar
Ullah, E., Parwani, A., Baig, M. M. & Singh, R. Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology — a recent scoping review. Diagn. Pathol. 19, 43 (2024).
Article PubMed PubMed Central Google Scholar
Zuo, K., Jiang, Y., Mo, F. & Lio, P. KG4Diagnosis: a hierarchical multi-agent LLM framework with knowledge graph enhancement for medical diagnosis. Preprint at arXiv https://doi.org/10.48550/arXiv.2412.16833 (2025).
Masanneck, L., Meuth, S. G. & Pawlitzki, M. Evaluating base and retrieval augmented LLMs with document or online support for evidence based neurology. npj Digit. Med. 8, 137 (2025).
Article PubMed PubMed Central Google Scholar
Ge, J. et al. Development of a liver disease-specific large language model chat interface using retrieval-augmented generation. Hepatology 80, 1158–1168 (2024).
Article PubMed Google Scholar
Griot, M., Hemptinne, C., Vanderdonckt, J. & Yuksel, D. Large language models lack essential metacognition for reliable medical reasoning. Nat. Commun. 16, 642 (2025).
Article CAS PubMed PubMed Central Google Scholar
Lindsey, J. et al. On the biology of a large language model. Transformer Circuits transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-hallucinations (2025).
Farquhar, S., Kossen, J., Kuhn, L. & Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625–630 (2024).
Article CAS PubMed PubMed Central Google Scholar
Wei, J. et al. Measuring short-form factuality in large language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2411.04368 (2024).
Yu, H., Cheng, T., Cheng, Y. & Feng, R. FineMedLM-o1: enhancing the medical reasoning ability of LLM from supervised fine-tuning to test-time training. Preprint at arXiv https://doi.org/10.48550/arXiv.2501.09213 (2025).
Li, R., Wang, X. & Yu, H. LlamaCare: an instruction fine-tuned large language model for clinical NLP. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (eds Calzolari, N. et al.) 10632–10641 (ELRA and ICCL, 2024).
Laka, M., Carter, D., Milazzo, A. & Merlin, T. Challenges and opportunities in implementing clinical decision support systems (CDSS) at scale: interviews with Australian policymakers. Health Policy Technol. 11, 100652 (2022).
Article Google Scholar
Abell, B. et al. Identifying barriers and facilitators to successful implementation of computerized clinical decision support systems in hospitals: a NASSS framework-informed scoping review. Implement. Sci. 18, 32 (2023).
Article PubMed PubMed Central Google Scholar
Zakka, C. et al. Almanac – retrieval-augmented language models for clinical medicine. NEJM AI 1, aioa2300068 (2024).
Article Google Scholar
Jiang, Y. et al. MedAgentBench: a virtual EHR environment to benchmark medical LLM agents. NEJM AI https://doi.org/10.1056/AIdbp2500144 (2025).
Callahan, A. et al. Using aggregate patient data at the bedside via an on-demand consultation service. NEJM Catal. Innov. Care Deliv., https://doi.org/10.1056/CAT.21.0224 (2021).
Article Google Scholar
Bedi, S. et al. MedHELM: holistic evaluation of large language models for medical tasks. Preprint at arXiv https://doi.org/10.48550/arXiv.2505.23802 (2025).
Armitage, H. Clinicians can ‘chat’ with medical records through new AI software, ChatEHR. Stanford Medicine News Center med.stanford.edu/news/all-news/2025/06/chatehr.html (2025).
Yao, Y. et al. A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly. High Confid. Comput. 4, 100211 (2024).
Article Google Scholar
OWASP. OWASP top 10 for LLM applications 2025. OWASP genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/ (2025).
Clusmann, J. et al. Prompt injection attacks on vision language models in oncology. Nat. Commun. 16, 1239 (2025).
Article CAS PubMed PubMed Central Google Scholar
Clusmann, J. et al. Incidental prompt injections on vision–language models in real-life histopathology. NEJM AI 2, aics2500078 (2025).
Article Google Scholar
Freyer, O., Wiest, I. C., Kather, J. N. & Gilbert, S. A future role for health applications of large language models depends on regulators enforcing safety standards. Lancet Digit. Health 6, e662–e672 (2024).
Article CAS PubMed Google Scholar
Hubinger, E. et al. Sleeper agents: training deceptive LLMs that persist through safety training. Preprint at arXiv https://doi.org/10.48550/arXiv.2401.05566 (2024).
Greenblatt, R. et al. Alignment faking in large language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2412.14093 (2024).
Derraz, B. et al. New regulatory thinking is needed for AI-based personalised drug and cell therapies in precision oncology. npj Precis. Oncol. 8, 23 (2024).
Article PubMed PubMed Central Google Scholar
Guevara, M. et al. Large language models to identify social determinants of health in electronic health records. npj Digit. Med. 7, 6 (2024).
Article PubMed PubMed Central Google Scholar
Yu, K.-H., Healey, E., Leong, T.-Y., Kohane, I. S. & Manrai, A. K. Medical artificial intelligence and human values. N. Engl. J. Med. 390, 1895–1904 (2024).
Article PubMed Google Scholar
Omar, M. et al. Sociodemographic biases in medical decision making by large language models. Nat. Med. 31, 1873–1881 (2025).
Article CAS PubMed Google Scholar
Zack, T. et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Lancet Digit. Health 6, e12–e22 (2024).
Article CAS PubMed Google Scholar
Yang, J., Soltan, A. A. S., Eyre, D. W., Yang, Y. & Clifton, D. A. An adversarial training framework for mitigating algorithmic biases in clinical machine learning. npj Digit. Med. 6, 55 (2023).
Article PubMed PubMed Central Google Scholar
Goh, E. et al. Physician clinical decision modification and bias assessment in a randomized controlled trial of AI assistance. Commun. Med. 5, 59 (2025).
Article PubMed PubMed Central Google Scholar
Savage, T., Nayak, A., Gallo, R., Rangan, E. & Chen, J. H. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. npj Digit. Med. 7, 20 (2024).
Article PubMed PubMed Central Google Scholar
How to edit anthropomorphic language about artificial intelligence. Nat. Rev. Phys. 5, 263 (2023).
Goh, E. et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat. Med. 31, 1223–1238 (2025).
Google Scholar
Wu, C. et al. Towards evaluating and building versatile large language models for medicine. npj Digit. Med. 8, 58 (2025).
Article CAS PubMed PubMed Central Google Scholar
Bedi, S. et al. Testing and evaluation of health care applications of large language models: a systematic review. JAMA https://doi.org/10.1001/jama.2024.21700 (2024).
Article PubMed Central Google Scholar
Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60–69 (2025).
Article CAS PubMed PubMed Central Google Scholar
Lekadir, K. et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ 388, e081554 (2025).
Article PubMed PubMed Central Google Scholar
Moons, K. G. M. et al. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ 388, e082505 (2025).
Article PubMed PubMed Central Google Scholar
Zou, J. & Topol, E. J. The rise of agentic AI teammates in medicine. Lancet 405, 457 (2025).
Article PubMed Google Scholar
Anthropic. Building effective agents. Anthropic www.anthropic.com/engineering/building-effective-agents (2024).
Wu, S. et al. A comparative study on reasoning patterns of OpenAI’s o1 model. Preprint at arXiv https://doi.org/10.48550/arXiv.2410.13639 (2024).
DeepSeek-AI. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. Preprint at arXiv https://doi.org/10.48550/arXiv.2501.12948 (2025).
Schmidgall, S. et al. Agent laboratory: using LLM agents as research assistants. Preprint at arXiv https://doi.org/10.48550/arXiv.2501.04227 (2025).
Gottweis, J. et al. Towards an AI co-scientist. Preprint at arXiv https://doi.org/10.48550/arXiv.2502.18864 (2025).
Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024).
Article CAS PubMed Google Scholar
Penadés, J. R. et al. AI mirrors experimental science to uncover a novel mechanism of gene transfer crucial to bacterial evolution. Preprint at bioRxiv https://doi.org/10.1101/2025.02.19.639094 (2025).
Ferber, D. et al. Development and validation of autonomous artificial intelligence agent for clinical decision-making in oncology. Nat. Cancer https://doi.org/10.1038/s43018-025-00991-6 (2025).
Busch, F. et al. Current applications and challenges in large language models for patient care: a systematic review. Commun. Med. 5, 26 (2025).
Article PubMed PubMed Central Google Scholar
Hasjim, B. J. et al. The AI agent in the room: informing objective decision making at the transplant selection committee. Preprint at medRxiv https://doi.org/10.1101/2024.12.06.24318575 (2024).
Lobentanzer, S. et al. A platform for the biomedical application of large language models. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02534-3 (2025).
Article PubMed PubMed Central Google Scholar
Chen, J. Bringing generative AI to healthcare perspective. Sequoia www.sequoiacap.com/article/generative-ai-for-healthcare-perspective/ (2023).
Bhimani, M. et al. Real-world evaluation of large language models in healthcare (RWE-LLM): a new realm of AI safety & validation. Preprint at medRxiv https://doi.org/10.1101/2025.03.17.25324157 (2025).
Jung, D., Butler, A., Park, J. & Saperstein, Y. Evaluating the impact of a specialized LLM on physician experience in clinical decision support: a comparison of ask Avo and ChatGPT-4. Preprint at arXiv https://doi.org/10.48550/arXiv.2409.15326 (2024).
Harvey H. World’s first regulatory clearance for a large language model medical device. Hardian Health. https://www.hardianhealth.com/insights/valmed-ai-medical-device-regulatory-clearance (2025).
McDuff, D. et al. Towards accurate differential diagnosis with large language models. Nature 642, 451–457 (2025).
Article CAS PubMed PubMed Central Google Scholar
Ifargan, T., Hafner, L., Kern, M., Alcalay, O. & Kishony, R. Autonomous LLM-driven research — from data to human-verifiable research papers. NEJM AI doi.org/10.1056/AIoa2400555 (2025).
Ke, Y. et al. Mitigating cognitive biases in clinical decision-making through multi-agent conversations using large language models: simulation study. J. Med. Internet Res. 26, e59439 (2024).
Article PubMed PubMed Central Google Scholar
Lee, D. Prompt infection: LLM-to-LLM prompt injection within multi-agent systems. Preprint at https://arxiv.org/html/2410.07283v1 (2024).
Abbasian, M. et al. Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI. npj Digit. Med. 7, 82 (2024).
Article PubMed PubMed Central Google Scholar
Li, Y. et al. LKAN: LLM-based knowledge-aware attention network for clinical staging of liver cancer. IEEE J. Biomed. Health Inf. 29, 3007–3020 (2025).
Article Google Scholar
Berry, P., Dhanakshirur, R. R. & Khanna, S. Utilizing large language models for gastroenterology research: a conceptual framework. Ther. Adv. Gastroenterol. 18, 17562848251328577 (2025).
Article Google Scholar
Qin, Y., Chang, J., Li, L. & Wu, M. Enhancing gastroenterology with multimodal learning: the role of large language model chatbots in digestive endoscopy. Front. Med. 12, 1583514 (2025).
Article Google Scholar
Berner, E. S. (ed.) Clinical Decision Support Systems: Theory and Practice (Springer, 2016).
Meta AI. Introducing Meta Llama 3: the most capable openly available LLM to date. Meta AI ai.meta.com/blog/meta-llama-3/ (2024).
Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).
Article CAS PubMed Google Scholar
Tunstall, L., Von Werra, L. & Wolf, T. Natural Language Processing with Transformers: Building Language Applications with Hugging Face (O’Reilly Media, 2022).
Friedman, C. & Hripcsak, G. Natural language processing and its future in medicine. Acad. Med. 74, 890–895 (1999).
Article CAS PubMed Google Scholar
Tunstall, L., Von Werra, L. & Wolf, T. Natural language processing with transformers: building language applications with hugging face. O’Reilly Media https://www.oreilly.com/library/view/natural-language-processing/9781098136789/ (2022).
Chapman, W. W. et al. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J. Am. Med. Inform. Assoc. 18, 540–543 (2011).
Article PubMed PubMed Central Google Scholar
Nadkarni, P. M., Ohno-Machado, L. & Chapman, W. W. Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18, 544–551 (2011).
Article PubMed PubMed Central Google Scholar
Leaman, R., Khare, R. & Lu, Z. Challenges in clinical natural language processing for automated disorder normalization. J. Biomed. Inform. 57, 28–37 (2015).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

J.N.K. discloses support for the research and publication of this work from the EU’s Horizon Europe Research and Innovation programme (GENIAL, 101096312) and the European Research Council ERC (NADIR, grant number 101114631). J.C. discloses support for the research for this work from the Mildred-Scheel-Postdoktorandenprogramm of the German Cancer Aid (grant number 70115730).

Author information

Authors and Affiliations

Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
Isabella Catharina Wiest, Jan Clusmann, Carolin V. Schneider & Jakob Nikolas Kather
Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
Isabella Catharina Wiest
German Cancer Consortium (DKTK) partner site Dresden and German Cancer Research Center (DKFZ), Heidelberg, Germany
Isabella Catharina Wiest & Jan Clusmann
Ajmera Transplant Centre, University Health Network, Toronto, Ontario, Canada
Mamatha Bhat
Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany
Jan Clusmann & Carolin V. Schneider
Department of Thoracic Surgery, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, University of Electronic Science and Technology of China (UESTC), Chengdu, China
Xiaofeng Jiang
Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
Jakob Nikolas Kather
Medical Oncology, National Center for Tumour Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
Jakob Nikolas Kather

Authors

Isabella Catharina Wiest
View author publications
Search author on:PubMed Google Scholar
Mamatha Bhat
View author publications
Search author on:PubMed Google Scholar
Jan Clusmann
View author publications
Search author on:PubMed Google Scholar
Carolin V. Schneider
View author publications
Search author on:PubMed Google Scholar
Xiaofeng Jiang
View author publications
Search author on:PubMed Google Scholar
Jakob Nikolas Kather
View author publications
Search author on:PubMed Google Scholar

Contributions

All the authors contributed equally to all aspects of the article.

Corresponding author

Correspondence to Jakob Nikolas Kather.

Ethics declarations

Competing interests

J.N.K. declares consulting services for Bioptimus, Owkin, DoMore Diagnostics, Panakeia, AstraZeneca, Mindpeak and MultiplexDx; holds shares in StratifAI and Synagen; has received a research grant from GSK; and has received honoraria from AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer and Fresenius. I.C.W. has received honoraria from AstraZeneca. All other authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Gastroenterology & Hepatology thanks Dennis Shung, who co-reviewed with Sunny Chung; Arsela Prelaj; and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wiest, I.C., Bhat, M., Clusmann, J. et al. Large language models for clinical decision support in gastroenterology and hepatology. Nat Rev Gastroenterol Hepatol (2025). https://doi.org/10.1038/s41575-025-01108-1

Download citation

Accepted: 21 July 2025
Published: 22 August 2025
DOI: https://doi.org/10.1038/s41575-025-01108-1

Large language models for clinical decision support in gastroenterology and hepatology

Subjects

Abstract

Access options

Similar content being viewed by others

Evaluation and mitigation of the limitations of large language models in clinical decision-making

Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework

Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Evaluation and mitigation of the limitations of large language models in clinical decision-making

Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework

Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links