Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Large language models for clinical decision support in gastroenterology and hepatology

Abstract

Clinical decision making in gastroenterology and hepatology has become increasingly complex and challenging for physicians. This growing complexity can be addressed by computational tools that support clinical decisions. Although numerous clinical decision support systems (CDSS) have emerged, they have faced difficulties with real-world performance and generalizability, resulting in limited clinical adoption. Generative artificial intelligence (AI), particularly large language models (LLMs), are introducing new possibilities for CDSS by offering more flexible and adaptable support that better reflects complex clinical scenarios. LLMs can process unstructured text, including patient data and medical guidelines, and integrate various information sources with high accuracy, especially when augmented with retrieval-augmented generation. Thus, LLMs can provide dynamic, context-specific support by generating personalized treatment recommendations, identifying potential complications based on patient history, and enabling natural language interactions with health-care providers. However, important challenges persist, particularly regarding biases, hallucinations, interoperability barriers, and proper training of health-care providers. We examine the parallel evolution of the complexity in clinical management in gastroenterology and hepatology, and the technical developments leading to current generative AI models. We discuss how these advances are converging to create effective CDSS, providing a conceptual basis for further development and clinical adoption of these systems.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Evolution of clinical decision support systems and cases of use in gastroenterology and hepatology.
Fig. 2: Stages of clinical decision making and opportunities for LLM-based decision support.
Fig. 3: Successful implementation of LLMs in clinical decision support systems.

Similar content being viewed by others

References

  1. Densen, P. Challenges and opportunities facing medical education. Trans. Am. Clin. Climatol. Assoc. 122, 48–58 (2011).

    PubMed  PubMed Central  Google Scholar 

  2. Morris, Z. S., Wooding, S. & Grant, J. The answer is 17 years, what is the question: understanding time lags in translational research. J. R. Soc. Med. 104, 510–520 (2011).

    Article  PubMed  PubMed Central  Google Scholar��

  3. Porter, J., Boyd, C., Skandari, M. R. & Laiteerapong, N. Revisiting the time needed to provide adult primary care. J. Gen. Intern. Med. 38, 147–155 (2023).

    Article  PubMed  Google Scholar 

  4. Macaron, M. M. et al. A systematic review and meta analysis on burnout in physicians during the COVID-19 pandemic: a hidden healthcare crisis. Front. Psychiatry 13, 1071397 (2022).

    Article  PubMed  Google Scholar 

  5. Ferrucci, L. & Kohanski, R. Better care for older patients with complex multimorbidity and frailty: a call to action. Lancet Healthy Longev. 3, e581–e583 (2022).

    Article  PubMed  Google Scholar 

  6. Osheroff, J. A. et al. Improving Outcomes with Clinical Decision Support: An Implementer’s Guide (HIMSS, 2012).

  7. Osheroff, J. A. et al. A roadmap for national action on clinical decision support. J. Am. Med. Inform. Assoc. 14, 141–145 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Moxey, A. et al. Computerized clinical decision support for prescribing: provision does not guarantee uptake. J. Am. Med. Inform. Assoc. 17, 25–33 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Kortteisto, T., Komulainen, J., Mäkelä, M., Kunnamo, I. & Kaila, M. Clinical decision support must be useful, functional is not enough: a qualitative study of computer-based clinical decision support in primary care. BMC Health Serv. Res. 12, 349 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Patterson, E. S. et al. Identifying barriers to the effective use of clinical reminders: bootstrapping multiple methods. J. Biomed. Inform. 38, 189–199 (2005).

    Article  PubMed  Google Scholar 

  11. Liberati, E. G. et al. What hinders the uptake of computerized decision support systems in hospitals? A qualitative study and framework for implementation. Implement. Sci. 12, 113 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  12. de Dombal, F. T. Computers, diagnoses and patients with acute abdominal pain. Arch. Emerg. Med. 9, 267–270 (1992).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Sutton, R. T. et al. An overview of clinical decision support systems: benefits, risks, and strategies for success. npj Digit. Med. 3, 17 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Dang, A. Real-world evidence: a primer. Pharm. Med. 37, 25–36 (2023).

    Article  Google Scholar 

  15. Zhang, K. et al. A generalist vision-language foundation model for diverse biomedical tasks. Nat. Med. 30, 3129–3141 (2024).

    Article  CAS  PubMed  Google Scholar 

  16. Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature https://doi.org/10.1038/s41586-024-07618-3 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.12712 (2023).

  18. Gemini Team Google. Gemini 1.5: unlocking multimodal understanding across millions of tokens of context. Preprint at arXiv https://doi.org/10.48550/arXiv.2403.05530 (2024).

  19. Blease, C. R., Locher, C., Gaab, J., Hägglund, M. & Mandl, K. D. Generative artificial intelligence in primary care: an online survey of UK general practitioners. BMJ Health Care Inf. 31, e101102 (2024).

    Article  Google Scholar 

  20. Laohawetwanit, T., Pinto, D. G. & Bychkov, A. A survey analysis of the adoption of large language models among pathologists. Am. J. Clin. Pathol. https://doi.org/10.1093/ajcp/aqae093 (2024).

    Article  PubMed  Google Scholar 

  21. Spotnitz, M. et al. A survey of clinicians’ views of the utility of large language models. Appl. Clin. Inform. 15, 306–312 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Ferber, D. et al. GPT-4 for information retrieval and comparison of medical oncology guidelines. NEJM AI 1, AIcs2300235 (2024).

    Article  Google Scholar 

  23. Wiest, I. C. et al. Privacy-preserving large language models for structured medical information retrieval. npj Digit. Med. 7, 257 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. https://doi.org/10.1038/s41591-024-02855-5 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Wornow, M. et al. Zero-shot clinical trial patient matching with LLMs. NEJM AI 2, AIcs2400360 (2025).

    Article  Google Scholar 

  26. Weissman, G. E., Mankowitz, T. & Kanter, G. P. Unregulated large language models produce medical device-like output. npj Digit. Med. 8, 148 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  27. US Department of Health and Human Services. Artificial intelligence-enabled device software functions: lifecycle management and marketing submission recommendations. Draft guidance for industry and Food and Drug Administration staff. FDA www.fda.gov/media/184856/download (2025).

  28. EUR-Lex. Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC. EUR-Lex eur-lex.europa.eu/eli/reg/2017/745/oj/eng (2025).

  29. Vieujean, S. et al. Understanding the therapeutic toolkit for inflammatory bowel disease. Nat. Rev. Gastroenterol. Hepatol. https://doi.org/10.1038/s41575-024-01035-7 (2025).

    Article  PubMed  Google Scholar 

  30. Colombel, J.-F., Narula, N. & Peyrin-Biroulet, L. Management strategies to improve outcomes of patients with inflammatory bowel diseases. Gastroenterology 152, 351–361.e5 (2017).

    Article  PubMed  Google Scholar 

  31. Bruner, L. P., White, A. M. & Proksell, S. Inflammatory bowel disease. Prim. Care 50, 411–427 (2023).

    Article  PubMed  Google Scholar 

  32. Rogler, G., Singh, A., Kavanaugh, A. & Rubin, D. T. Extraintestinal manifestations of inflammatory bowel disease: current concepts, treatment, and implications for disease management. Gastroenterology 161, 1118–1132 (2021).

    Article  CAS  PubMed  Google Scholar 

  33. Ui-Haq, Z. et al. Health-care resource use and costs associated with inflammatory bowel disease in northwest London: a retrospective linked database study. BMC Gastroenterol. 24, 480 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Ewais, T. et al. A systematic review and meta-analysis of mindfulness based interventions and yoga in inflammatory bowel disease. J. Psychosom. Res. 116, 44–53 (2019).

    Article  PubMed  Google Scholar 

  35. Allen, A. M., Younossi, Z. M., Diehl, A. M., Charlton, M. R. & Lazarus, J. V. Envisioning how to advance the MASH field. Nat. Rev. Gastroenterol. Hepatol. 21, 726–738 (2024).

    Article  PubMed  Google Scholar 

  36. Zelber-Sagi, S. et al. Food inequity and insecurity and MASLD: burden, challenges, and interventions. Nat. Rev. Gastroenterol. Hepatol. 21, 668–686 (2024).

    Article  PubMed  Google Scholar 

  37. Younossi, Z. M. et al. The global epidemiology of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH): a systematic review. Hepatology 77, 1335–1347 (2023).

    Article  PubMed  Google Scholar 

  38. Riazi, K. et al. The prevalence and incidence of NAFLD worldwide: a systematic review and meta-analysis. Lancet Gastroenterol. Hepatol. 7, 851–861 (2022).

    Article  CAS  PubMed  Google Scholar 

  39. Yang, Z. et al. Global burden of metabolic dysfunction-associated steatotic liver disease attributable to high fasting plasma glucose in 204 countries and territories from 1990 to 2021. Sci. Rep. 14, 22232 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Stefan, N., Yki-Järvinen, H. & Neuschwander-Tetri, B. A. Metabolic dysfunction-associated steatotic liver disease: heterogeneous pathomechanisms and effectiveness of metabolism-based treatment. Lancet Diabetes Endocrinol. 13, 134–148 (2025).

    Article  CAS  PubMed  Google Scholar 

  41. Rinella, M. E. et al. A multisociety Delphi consensus statement on new fatty liver disease nomenclature. Hepatology 78, 1966–1986 (2023).

    Article  PubMed  Google Scholar 

  42. Soto-Catalán, M. et al. Semaglutide improves liver steatosis and de novo lipogenesis markers in obese and type-2-diabetic mice with metabolic-dysfunction-associated steatotic liver disease. Int. J. Mol. Sci. 25, 2961 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Wattacheril, J. J. The role of noninvasive biomarkers: evaluation and management of MASLD. GI & Hepatology News (5 June 2024).

  44. Schneider, C. V. et al. Large-scale identification of undiagnosed hepatic steatosis using natural language processing. EClinicalMedicine 62, 102149 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Eskridge, W. et al. Metabolic dysfunction-associated steatotic liver disease and metabolic dysfunction-associated steatohepatitis: the patient and physician perspective. J. Clin. Med. 12, 6216 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Konyn, P., Ahmed, A. & Kim, D. Current epidemiology in hepatocellular carcinoma. Expert. Rev. Gastroenterol. Hepatol. 15, 1295–1307 (2021).

    Article  CAS  PubMed  Google Scholar 

  47. Samant, H., Amiri, H. S. & Zibari, G. B. Addressing the worldwide hepatocellular carcinoma: epidemiology, prevention and management. J. Gastrointest. Oncol. 12, S361–S373 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  48. European Association for the Study of the Liver. EASL clinical practice guidelines on the management of hepatocellular carcinoma. J. Hepatol. 82, 315–374 (2025).

    Article  Google Scholar 

  49. Reig, M. et al. BCLC strategy for prognosis prediction and treatment recommendation: the 2022 update. J. Hepatol. 76, 681–693 (2022).

    Article  PubMed  Google Scholar 

  50. Yip, T. C.-F. & Wong, G. L.-H. Transforming the landscape of liver cancer detection and care. Nat. Rev. Gastroenterol. Hepatol. https://doi.org/10.1038/s41575-024-01018-8 (2024).

    Article  Google Scholar 

  51. Alawyia, B. & Constantinou, C. Hepatocellular carcinoma: a narrative review on current knowledge and future prospects. Curr. Treat. Options Oncol. 24, 711–724 (2023).

    Article  PubMed  Google Scholar 

  52. Cherradi, S. et al. Modelling hepatocellular carcinoma microenvironment phenotype to evaluate drug efficacy. Sci. Rep. 15, 1179 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Jing, F., Li, X., Jiang, H., Sun, J. & Guo, Q. Combating drug resistance in hepatocellular carcinoma: no awareness today, no action tomorrow. Biomed. Pharmacother. 167, 115561 (2023).

    Article  CAS  PubMed  Google Scholar 

  54. Ducreux, M. et al. The management of hepatocellular carcinoma. Current expert opinion and recommendations derived from the 24th ESMO/World Congress on Gastrointestinal Cancer, Barcelona, 2022. ESMO Open. 8, 101567 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Clusmann, J. et al. Machine learning predicts liver cancer risk from routine clinical data: a large population-based multicentric study. Preprint at medRxiv https://doi.org/10.1101/2024.11.03.24316662 (2024).

  56. Sung, H. et al. Colorectal cancer incidence trends in younger versus older adults: an analysis of population-based cancer registry data. Lancet Oncol. 26, 51–63 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  57. US National Library of Medicine. ClinicalTrials.gov clinicaltrials.gov/study/NCT05080673?cond=colonoscopy%20screening%20intervals&intr=NCT05080673&rank=1 (2024).

  58. Kuipers, E. J. & Spaander, M. C. Personalized screening for colorectal cancer. Nat. Rev. Gastroenterol. Hepatol. 15, 391–392 (2018).

    Article  CAS  PubMed  Google Scholar 

  59. Kuipers, E. J. et al. Colorectal cancer. Nat. Rev. Dis. Primers 1, 15065 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Kahn, C. L. et al. Circulating tumor DNA in addition to fecal immunochemical test in a dual-test colorectal cancer screening approach. Clin. Colorectal Cancer 24, 310–319.e1 (2025).

    PubMed  Google Scholar 

  61. Brenne, S. S. et al. Colorectal cancer detected by liquid biopsy 2 years prior to clinical diagnosis in the HUNT study. Br. J. Cancer 129, 861–868 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Mateo, J. et al. A framework to rank genomic alterations as targets for cancer precision medicine: the ESMO scale for clinical actionability of molecular targets (ESCAT). Ann. Oncol. 29, 1895–1902 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Kasi, P. M. et al. BESPOKE study protocol: a multicentre, prospective observational study to evaluate the impact of circulating tumour DNA guided therapy on patients with colorectal cancer. BMJ Open. 11, e047831 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Kasi, P. M. et al. Circulating tumor DNA (ctDNA) for informing adjuvant chemotherapy (ACT) in stage II/III colorectal cancer (CRC): interim analysis of BESPOKE CRC study [abstract]. J. Clin. Oncol. 42, 9 (2024).

    Article  Google Scholar 

  65. Tie, J. et al. Circulating tumor DNA analysis guiding adjuvant therapy in stage II colon cancer. N. Engl. J. Med. 386, 2261–2272 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Yukami, H. et al. Circulating tumor DNA (ctDNA) dynamics in patients with colorectal cancer (CRC) with molecular residual disease: updated analysis from GALAXY study in the CIRCULATE-JAPAN [abstract]. J. Clin. Oncol. 42, 6 (2024).

    Article  Google Scholar 

  67. Kramer, A. et al. Early evaluation of the effectiveness and cost-effectiveness of ctDNA-guided selection for adjuvant chemotherapy in stage II colon cancer. Ther. Adv. Med. Oncol. 16, 17588359241266164 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Krell, M., Llera, B. & Brown, Z. J. Circulating tumor DNA and management of colorectal cancer. Cancers 16, 21 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Li, W. et al. Analytical evaluation of circulating tumor DNA sequencing assays. Sci. Rep. 14, 4973 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Pascual, J. et al. ESMO recommendations on the use of circulating tumour DNA assays for patients with cancer: a report from the ESMO Precision Medicine Working Group. Ann. Oncol. 33, 750–768 (2022).

    Article  CAS  PubMed  Google Scholar 

  71. Stetson, D. et al. Next-generation molecular residual disease assays: do we have the tools to evaluate them properly? J. Clin. Oncol. 42, 2736–2740 (2024).

    Article  PubMed  Google Scholar 

  72. Denlinger, C. S. & Barsevick, A. M. The challenges of colorectal cancer survivorship. J. Natl Compr. Canc. Netw. 7, 883–893; quiz 894 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Yu, M., Ren, L., Zheng, M., Hong, M. & Wei, Z. Delayed diagnosis of Wilson’s Disease report from 179 newly diagnosed cases in China. Front. Neurol. 13, 884840 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  74. Melas, N., Amin, R., Gyllemark, P., Younes, A. H. & Almer, S. Whipple’s disease: the great masquerader — a high level of suspicion is the key to diagnosis. BMC Gastroenterol. 21, 128 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  75. Ronicke, S. et al. Can a decision support system accelerate rare disease diagnosis? Evaluating the potential impact of Ada DX in a retrospective study. Orphanet J. Rare Dis. 14, 69 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  76. Schaaf, J., Sedlmayr, M., Sedlmayr, B. & Storf, H. User-centred development of a diagnosis support system for rare diseases. Stud. Health Technol. Inform. 293, 11–18 (2022).

    PubMed  Google Scholar 

  77. Yakubovich, K. Clinical decision-making theories. Ann. Innov. Med., https://doi.org/10.59652/aim.v1i2.51 (2023).

    Article  Google Scholar 

  78. Vorisek, C. N. et al. Fast healthcare interoperability resources (FHIR) for interoperability in health research: systematic review. JMIR Med. Inform. 10, e35724 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Goodrum, H., Roberts, K. & Bernstam, E. V. Automatic classification of scanned electronic health record documents. Int. J. Med. Inform. 144, 104302 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Poissant, L., Pereira, J., Tamblyn, R. & Kawasumi, Y. The impact of electronic health records on time efficiency of physicians and nurses: a systematic review. J. Am. Med. Inform. Assoc. 12, 505–516 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Gold, R. et al. Using electronic health record-based clinical decision support to provide social risk-informed care in community health centers: protocol for the design and assessment of a clinical decision support tool. JMIR Res. Protoc. 10, e31733 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Quan, P. L. et al. Usefulness of drug allergy alert systems: present and future. Curr. Treat. Options Allergy https://doi.org/10.1007/s40521-023-00351-8 (2023).

    Article  Google Scholar 

  83. Elkin, P. L. et al. The introduction of a diagnostic decision support system (DXplainTM) into the workflow of a teaching hospital service can decrease the cost of service for diagnostically challenging diagnostic related groups (DRGs). Int. J. Med. Inform. 79, 772–777 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  84. Colabianchi, S., Costantino, F. & Sabetta, N. Assessment of a large language model based digital intelligent assistant in assembly manufacturing. Comput. Ind. 162, 104129 (2024).

    Article  Google Scholar 

  85. Wekenborg, M. K., Gilbert, S. & Kather, J. N. Examining human-AI interaction in real-world healthcare beyond the laboratory. npj Digit. Med. 8, 169 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  86. Vaccaro, M., Almaatouq, A. & Malone, T. When combinations of humans and AI are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 8, 2293–2303 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Groopman, J. E. How Doctors Think (Houghton Mifflin, 2007).

  88. Kawamoto, K., Houlihan, C. A., Balas, E. A. & Lobach, D. F. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ 330, 765 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  89. Webster, C. S., Taylor, S. & Weller, J. M. Cognitive biases in diagnosis and decision making during anaesthesia and intensive care. BJA Educ. 21, 420–425 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Chen, Z. et al. Harnessing the power of clinical decision support systems: challenges and opportunities. Open. Heart 10, e002432 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Miller, R. A., McNeil, M. A., Challinor, S. M., Masarie, F. E. Jr & Myers, J. D. The INTERNIST-1/QUICK MEDICAL REFERENCE project — status report. West. J. Med. 145, 816–822 (1986).

    CAS  PubMed  PubMed Central  Google Scholar 

  92. Shortliffe, E. H. Mycin: a knowledge-based computer program applied to infectious diseases. Proc. Annu. Symp. Comput. Appl. Med. Care 1977, 66–69 (1977).

    Google Scholar 

  93. Wright, A. & Sittig, D. F. A four-phase model of the evolution of clinical decision support architectures. Int. J. Med. Inform. 77, 641–649 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Darmoni, S. J. & Poynard, T. Computer-aided decision support in hepatology. Scand. J. Gastroenterol. 27, 889–896 (1992).

    Article  CAS  PubMed  Google Scholar 

  95. Bates, D. W. et al. The impact of computerized physician order entry on medication error prevention. J. Am. Med. Inform. Assoc. 6, 313–321 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Malchow-Møller, A., Bjerregaard, B. & Hilden, J. Computer-assisted diagnosis in gastroenterology. Scand. J. Gastroenterol. Suppl. 216, 225–233 (1996).

    Article  PubMed  Google Scholar 

  97. Babic, A., Mathiesen, U., Hedin, K., Bodemar, G. & Wigertz, O. Assessing an AI knowledge-base for asymptomatic liver diseases. Proc. AMIA Symp. 1998, 513–517 (1998).

    Google Scholar 

  98. Quinn, J. An HL7 (Health Level Seven) overview. J. AHIMA 70, 32–34 (1999). quiz 35–6.

    CAS  PubMed  Google Scholar 

  99. Ferrucci, D. Introduction to “This is Watson”. IBM J. Res. Dev. 56, 235–249 (2012).

    Article  Google Scholar 

  100. Papadopoulos, P., Soflano, M., Chaudy, Y., Adejo, W. & Connolly, T. M. A systematic review of technologies and standards used in the development of rule-based clinical decision support systems. Health Technol. 12, 713–727 (2022).

    Article  Google Scholar 

  101. Dugas, M., Schauer, R., Volk, A. & Rau, H. Interactive decision support in hepatic surgery. BMC Med. Inform. Decis. Mak. 2, 5 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  102. Ash, J. S., Sittig, D. F., Campbell, E. M., Guappone, K. P. & Dykstra, R. H. Some unintended consequences of clinical decision support systems. AMIA Annu. Symp. Proc. 2007, 26–30 (2007).

    PubMed  PubMed Central  Google Scholar 

  103. Castaneda, C. et al. Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J. Clin. Bioinforma. 5, 4 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  104. Eberhardt, J., Bilchik, A. & Stojadinovic, A. Clinical decision support systems: potential with pitfalls. J. Surg. Oncol. 105, 502–510 (2012).

    Article  PubMed  Google Scholar 

  105. Trivedi, M. H. et al. Development and implementation of computerized clinical guidelines: barriers and solutions. Methods Inf. Med. 41, 435–442 (2002).

    Article  CAS  PubMed  Google Scholar 

  106. Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. AI in health and medicine. Nat. Med. 28, 31–38 (2022).

    Article  CAS  PubMed  Google Scholar 

  107. Gliadkovskaya, A. Some doctors are using public AI chatbots like ChatGPT in clinical decisions. Is it safe? FIERCE Healthcare www.fiercehealthcare.com/special-reports/some-doctors-are-using-public-generative-ai-tools-chatgpt-clinical-decisions-it (2024).

  108. Tu, T. et al. Towards conversational diagnostic artificial intelligence. Nature 642, 442–450 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Rosen, J. M. Generative AI in pediatric gastroenterology. Curr. Gastroenterol. Rep. 26, 342–348 (2024).

    Article  PubMed  Google Scholar 

  110. Truhn, D., Reis-Filho, J. S. & Kather, J. N. Large language models should be used as scientific reasoning engines, not knowledge databases. Nat. Med. https://doi.org/10.1038/s41591-023-02594-z (2023).

    Article  PubMed  Google Scholar 

  111. Chui, M., Hazan, E., Roberts, R., Singla, A. & Smaje, K. The economic potential of generative AI: the next productivity frontier (McKinsey & Company, 2023).

  112. Strachan, J. W. A. et al. Testing theory of mind in large language models and humans. Nat. Hum. Behav. 8, 1285–1295 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  113. Odabashian, R. et al. Assessment of ChatGPT-3f.5’s knowledge in oncology: comparative study with ASCO-SEP benchmarks. JMIR AI 3, e50442 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  114. Kaiser, K. N. et al. Use of large language models as clinical decision support tools for management pancreatic adenocarcinoma using National Comprehensive Cancer Network guidelines. Surgery 182, 109267 (2025).

    Article  PubMed  Google Scholar 

  115. Zhou, S. et al. The performance of large language model-powered chatbots compared to oncology physicians on colorectal cancer queries. Int. J. Surg. 110, 6509–6517 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  116. Gong, E. J. et al. The potential clinical utility of the customized large language model in gastroenterology: a pilot study. Bioengineering 12, 1 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  117. Zhou, Q. et al. GastroBot: a Chinese gastrointestinal disease chatbot based on the retrieval-augmented generation. Front. Med. 11, 1392555 (2024).

    Article  Google Scholar 

  118. Kainz, J. et al. Fine-tuning an existing large language model with knowledge from the medical expert system Hepaxpert. Stud. Health Technol. Inform. 327, 143–147 (2025).

    PubMed  Google Scholar 

  119. Lim, D. Y. Z. et al. ChatGPT on guidelines: providing contextual knowledge to GPT allows it to provide advice on appropriate colonoscopy intervals. J. Gastroenterol. Hepatol. 39, 81–106 (2024).

    Article  PubMed  Google Scholar 

  120. Gorelik, Y., Ghersin, I., Maza, I. & Klein, A. Harnessing language models for streamlined postcolonoscopy patient management: a novel approach. Gastrointest. Endosc. 98, 639–641.e4 (2023).

    Article  PubMed  Google Scholar 

  121. Mukherjee, S. et al. Assessing ChatGPT’s ability to reply to queries regarding colon cancer screening based on multisociety guidelines. Gastro Hep Adv. 2, 1040–1043 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Patil, N. S., Huang, R. S., van der Pol, C. B. & Larocque, N. Using artificial intelligence chatbots as a radiologic decision-making tool for liver imaging: do ChatGPT and Bard communicate information consistent with the ACR appropriateness criteria? J. Am. Coll. Radiol. 20, 1010–1013 (2023).

    Article  PubMed  Google Scholar 

  123. Pugliese, N. et al. Accuracy, reliability, and comprehensibility of ChatGPT-generated medical responses for patients with nonalcoholic fatty liver disease. Clin. Gastroenterol. Hepatol. 22, 886–889.e5 (2024).

    Article  PubMed  Google Scholar 

  124. Endo, Y. et al. Quality of ChatGPT responses to questions related to liver transplantation. J. Gastrointest. Surg. 27, 1716–1719 (2023).

    Article  PubMed  Google Scholar 

  125. Zheng, N. S. et al. Detection of gastrointestinal bleeding with large language models to aid quality improvement and appropriate reimbursement. Gastroenterology 168, 111–120.e4 (2025).

    Article  PubMed  Google Scholar 

  126. Wiest, I. C. et al. Deep sight: enhancing periprocedural adverse event recording in endoscopy by structuring text documentation with privacy-preserving large language models. iGIE 3, 447–452.e5 (2024).

    Article  Google Scholar 

  127. Scherbakov, D. et al. Using large language models for extracting stressful life events to assess their impact on preventive colon cancer screening adherence. BMC Public. Health 25, 12 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  128. Gu, K. et al. Using GPT-4 for LI-RADS feature extraction and categorization with multilingual free-text reports. Liver Int. 44, 1578–1587 (2024).

    Article  PubMed  Google Scholar 

  129. Matute-González, M. et al. Utilizing a domain-specific large language model for LI-RADS v2018 categorization of free-text MRI reports: a feasibility study. Insights Imaging 15, 280 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  130. Spitzl, D. et al. Leveraging large language models for accurate classification of liver lesions from MRI reports. Comput. Struct. Biotechnol. J. 27, 2139–2146 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  131. Kim, H. et al. Conversion of mixed-language free-text CT reports of pancreatic cancer to National Comprehensive Cancer Network structured reporting templates by using GPT-4. Korean J. Radiol. 26, 557–568 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  132. Wang, A. et al. Large language model answers medical questions about standard pathology reports. Front. Med. 11, 1402457 (2024).

    Article  Google Scholar 

  133. Pereyra, L., Schlottmann, F., Steinberg, L. & Lasa, J. Colorectal cancer prevention: is chat generative pretrained transformer (Chat GPT) ready to assist physicians in determining appropriate screening and surveillance recommendations? J. Clin. Gastroenterol. 58, 1022–1027 (2024).

    Article  PubMed  Google Scholar 

  134. Chatziisaak, D. et al. Concordance of ChatGPT artificial intelligence decision-making in colorectal cancer multidisciplinary meetings: retrospective study. BJS Open. 9, zraf040 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  135. Cao, J. J. et al. Large language models’ responses to liver cancer surveillance, diagnosis, and management questions: accuracy, reliability, readability. Abdom. Radiol. 49, 4286–4294 (2024).

    Article  Google Scholar 

  136. Abou Chaar, M. K., Grigsby-Rocca, G., Huang, M. & Blackmon, S. H. ChatGPT vs expert-guided care pathways for postesophagectomy symptom management. Ann. Thorac. Surg. Short Rep. 2, 674–679 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  137. Huo, B. et al. Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD. Surg. Endosc. 38, 5668–5677 (2024).

    Article  PubMed  Google Scholar 

  138. Ye, Y. et al. Comparative evaluation of the accuracy and reliability of ChatGPT versions in providing information on Helicobacter pylori infection. Front. Public. Health 13, 1566982 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  139. Malik, S. et al. Evaluating artificial intelligence-driven responses to acute liver failure queries: a comparative analysis across accuracy, clarity, and relevance. Am. J. Gastroenterol. https://doi.org/10.14309/ajg.0000000000003255 (2024).

    Article  PubMed  Google Scholar 

  140. Ghersin, I. et al. Comparative evaluation of a language model and human specialists in the application of European guidelines for the management of inflammatory bowel diseases and malignancies. Endoscopy 56, 706–709 (2024).

    Article  PubMed  Google Scholar 

  141. Samaan, J. S. et al. Examining the accuracy and reproducibility of responses to nutrition questions related to inflammatory bowel disease by generative pre-trained transformer-4. Crohns Colitis 360 7, otae077 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  142. Sato, M. et al. Efficacy of a large language model in classifying branch-duct intraductal papillary mucinous neoplasms. Abdom. Radiol. https://doi.org/10.1007/s00261-025-05062-z (2025).

    Article  Google Scholar 

  143. Gui, X. et al. Enhancing hepatopathy clinical trial efficiency: a secure, large language model-powered pre-screening pipeline. BioData Min. 18, 42 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  144. Kresevic, S. et al. Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework. npj Digit. Med. 7, 102 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  145. Zhu, M. et al. Large language model trained on clinical oncology data predicts cancer progression. npj Digit. Med. 8, 397 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  146. Lusetti, F. et al. Applications of generative artificial intelligence in inflammatory bowel disease: a systematic review. Dig. Liver Dis. https://doi.org/10.1016/j.dld.2025.04.026 (2025).

    Article  PubMed  Google Scholar 

  147. Gravina, A. G. et al. May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients’ questions? An evidence-controlled analysis. World J. Gastroenterol. 30, 17–33 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  148. Yeo, Y. H. et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin. Mol. Hepatol. 29, 721–732 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  149. Lee, T.-C. et al. ChatGPT answers common patient questions about colonoscopy. Gastroenterology 165, 509–511.e7 (2023).

    Article  PubMed  Google Scholar 

  150. Kral, J., Hradis, M., Buzga, M. & Kunovsky, L. Exploring the benefits and challenges of AI-driven large language models in gastroenterology: think out of the box. Biomed. Pap. Med. Fac. Univ. Palacky. Olomouc Czech. Repub. 168, 277–283 (2024).

    Article  PubMed  Google Scholar 

  151. Gong, E. J. et al. Large language models in gastroenterology: systematic review. J. Med. Internet Res. 26, e66648 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  152. Yim, D., Khuntia, J., Parameswaran, V. & Meyers, A. Preliminary evidence of the use of generative AI in health care clinical services: systematic narrative review. JMIR Med. Inform. 12, e52073 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  153. Loh, E. ChatGPT and generative AI chatbots: challenges and opportunities for science, medicine and medical leaders. BMJ Lead. 8, 51–54 (2023).

    Article  Google Scholar 

  154. Ramjee, P. et al. CataractBot: an LLM-powered expert-in-the-loop chatbot for cataract patients. In Proc. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (eds Mamykina, L. & Ploetz, T) 45 (ACM, 2025).

  155. Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med. 3, 141 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  156. Ullah, E., Parwani, A., Baig, M. M. & Singh, R. Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology — a recent scoping review. Diagn. Pathol. 19, 43 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  157. Zuo, K., Jiang, Y., Mo, F. & Lio, P. KG4Diagnosis: a hierarchical multi-agent LLM framework with knowledge graph enhancement for medical diagnosis. Preprint at arXiv https://doi.org/10.48550/arXiv.2412.16833 (2025).

  158. Masanneck, L., Meuth, S. G. & Pawlitzki, M. Evaluating base and retrieval augmented LLMs with document or online support for evidence based neurology. npj Digit. Med. 8, 137 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  159. Ge, J. et al. Development of a liver disease-specific large language model chat interface using retrieval-augmented generation. Hepatology 80, 1158–1168 (2024).

    Article  PubMed  Google Scholar 

  160. Griot, M., Hemptinne, C., Vanderdonckt, J. & Yuksel, D. Large language models lack essential metacognition for reliable medical reasoning. Nat. Commun. 16, 642 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  161. Lindsey, J. et al. On the biology of a large language model. Transformer Circuits transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-hallucinations (2025).

  162. Farquhar, S., Kossen, J., Kuhn, L. & Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625–630 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. Wei, J. et al. Measuring short-form factuality in large language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2411.04368 (2024).

  164. Yu, H., Cheng, T., Cheng, Y. & Feng, R. FineMedLM-o1: enhancing the medical reasoning ability of LLM from supervised fine-tuning to test-time training. Preprint at arXiv https://doi.org/10.48550/arXiv.2501.09213 (2025).

  165. Li, R., Wang, X. & Yu, H. LlamaCare: an instruction fine-tuned large language model for clinical NLP. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (eds Calzolari, N. et al.) 10632–10641 (ELRA and ICCL, 2024).

  166. Laka, M., Carter, D., Milazzo, A. & Merlin, T. Challenges and opportunities in implementing clinical decision support systems (CDSS) at scale: interviews with Australian policymakers. Health Policy Technol. 11, 100652 (2022).

    Article  Google Scholar 

  167. Abell, B. et al. Identifying barriers and facilitators to successful implementation of computerized clinical decision support systems in hospitals: a NASSS framework-informed scoping review. Implement. Sci. 18, 32 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  168. Zakka, C. et al. Almanac – retrieval-augmented language models for clinical medicine. NEJM AI 1, aioa2300068 (2024).

    Article  Google Scholar 

  169. Jiang, Y. et al. MedAgentBench: a virtual EHR environment to benchmark medical LLM agents. NEJM AI https://doi.org/10.1056/AIdbp2500144 (2025).

  170. Callahan, A. et al. Using aggregate patient data at the bedside via an on-demand consultation service. NEJM Catal. Innov. Care Deliv., https://doi.org/10.1056/CAT.21.0224 (2021).

    Article  Google Scholar 

  171. Bedi, S. et al. MedHELM: holistic evaluation of large language models for medical tasks. Preprint at arXiv https://doi.org/10.48550/arXiv.2505.23802 (2025).

  172. Armitage, H. Clinicians can ‘chat’ with medical records through new AI software, ChatEHR. Stanford Medicine News Center med.stanford.edu/news/all-news/2025/06/chatehr.html (2025).

  173. Yao, Y. et al. A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly. High Confid. Comput. 4, 100211 (2024).

    Article  Google Scholar 

  174. OWASP. OWASP top 10 for LLM applications 2025. OWASP genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/ (2025).

  175. Clusmann, J. et al. Prompt injection attacks on vision language models in oncology. Nat. Commun. 16, 1239 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  176. Clusmann, J. et al. Incidental prompt injections on vision–language models in real-life histopathology. NEJM AI 2, aics2500078 (2025).

    Article  Google Scholar 

  177. Freyer, O., Wiest, I. C., Kather, J. N. & Gilbert, S. A future role for health applications of large language models depends on regulators enforcing safety standards. Lancet Digit. Health 6, e662–e672 (2024).

    Article  CAS  PubMed  Google Scholar 

  178. Hubinger, E. et al. Sleeper agents: training deceptive LLMs that persist through safety training. Preprint at arXiv https://doi.org/10.48550/arXiv.2401.05566 (2024).

  179. Greenblatt, R. et al. Alignment faking in large language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2412.14093 (2024).

  180. Derraz, B. et al. New regulatory thinking is needed for AI-based personalised drug and cell therapies in precision oncology. npj Precis. Oncol. 8, 23 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  181. Guevara, M. et al. Large language models to identify social determinants of health in electronic health records. npj Digit. Med. 7, 6 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  182. Yu, K.-H., Healey, E., Leong, T.-Y., Kohane, I. S. & Manrai, A. K. Medical artificial intelligence and human values. N. Engl. J. Med. 390, 1895–1904 (2024).

    Article  PubMed  Google Scholar 

  183. Omar, M. et al. Sociodemographic biases in medical decision making by large language models. Nat. Med. 31, 1873–1881 (2025).

    Article  CAS  PubMed  Google Scholar 

  184. Zack, T. et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Lancet Digit. Health 6, e12–e22 (2024).

    Article  CAS  PubMed  Google Scholar 

  185. Yang, J., Soltan, A. A. S., Eyre, D. W., Yang, Y. & Clifton, D. A. An adversarial training framework for mitigating algorithmic biases in clinical machine learning. npj Digit. Med. 6, 55 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  186. Goh, E. et al. Physician clinical decision modification and bias assessment in a randomized controlled trial of AI assistance. Commun. Med. 5, 59 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  187. Savage, T., Nayak, A., Gallo, R., Rangan, E. & Chen, J. H. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. npj Digit. Med. 7, 20 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  188. How to edit anthropomorphic language about artificial intelligence. Nat. Rev. Phys. 5, 263 (2023).

  189. Goh, E. et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat. Med. 31, 1223–1238 (2025).

    Google Scholar 

  190. Wu, C. et al. Towards evaluating and building versatile large language models for medicine. npj Digit. Med. 8, 58 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  191. Bedi, S. et al. Testing and evaluation of health care applications of large language models: a systematic review. JAMA https://doi.org/10.1001/jama.2024.21700 (2024).

    Article  PubMed Central  Google Scholar 

  192. Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60–69 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  193. Lekadir, K. et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ 388, e081554 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  194. Moons, K. G. M. et al. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ 388, e082505 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  195. Zou, J. & Topol, E. J. The rise of agentic AI teammates in medicine. Lancet 405, 457 (2025).

    Article  PubMed  Google Scholar 

  196. Anthropic. Building effective agents. Anthropic www.anthropic.com/engineering/building-effective-agents (2024).

  197. Wu, S. et al. A comparative study on reasoning patterns of OpenAI’s o1 model. Preprint at arXiv https://doi.org/10.48550/arXiv.2410.13639 (2024).

  198. DeepSeek-AI. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. Preprint at arXiv https://doi.org/10.48550/arXiv.2501.12948 (2025).

  199. Schmidgall, S. et al. Agent laboratory: using LLM agents as research assistants. Preprint at arXiv https://doi.org/10.48550/arXiv.2501.04227 (2025).

  200. Gottweis, J. et al. Towards an AI co-scientist. Preprint at arXiv https://doi.org/10.48550/arXiv.2502.18864 (2025).

  201. Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024).

    Article  CAS  PubMed  Google Scholar 

  202. Penadés, J. R. et al. AI mirrors experimental science to uncover a novel mechanism of gene transfer crucial to bacterial evolution. Preprint at bioRxiv https://doi.org/10.1101/2025.02.19.639094 (2025).

  203. Ferber, D. et al. Development and validation of autonomous artificial intelligence agent for clinical decision-making in oncology. Nat. Cancer https://doi.org/10.1038/s43018-025-00991-6 (2025).

  204. Busch, F. et al. Current applications and challenges in large language models for patient care: a systematic review. Commun. Med. 5, 26 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  205. Hasjim, B. J. et al. The AI agent in the room: informing objective decision making at the transplant selection committee. Preprint at medRxiv https://doi.org/10.1101/2024.12.06.24318575 (2024).

  206. Lobentanzer, S. et al. A platform for the biomedical application of large language models. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02534-3 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  207. Chen, J. Bringing generative AI to healthcare perspective. Sequoia www.sequoiacap.com/article/generative-ai-for-healthcare-perspective/ (2023).

  208. Bhimani, M. et al. Real-world evaluation of large language models in healthcare (RWE-LLM): a new realm of AI safety & validation. Preprint at medRxiv https://doi.org/10.1101/2025.03.17.25324157 (2025).

  209. Jung, D., Butler, A., Park, J. & Saperstein, Y. Evaluating the impact of a specialized LLM on physician experience in clinical decision support: a comparison of ask Avo and ChatGPT-4. Preprint at arXiv https://doi.org/10.48550/arXiv.2409.15326 (2024).

  210. Harvey H. World’s first regulatory clearance for a large language model medical device. Hardian Health. https://www.hardianhealth.com/insights/valmed-ai-medical-device-regulatory-clearance (2025).

  211. McDuff, D. et al. Towards accurate differential diagnosis with large language models. Nature 642, 451–457 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  212. Ifargan, T., Hafner, L., Kern, M., Alcalay, O. & Kishony, R. Autonomous LLM-driven research — from data to human-verifiable research papers. NEJM AI doi.org/10.1056/AIoa2400555 (2025).

  213. Ke, Y. et al. Mitigating cognitive biases in clinical decision-making through multi-agent conversations using large language models: simulation study. J. Med. Internet Res. 26, e59439 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  214. Lee, D. Prompt infection: LLM-to-LLM prompt injection within multi-agent systems. Preprint at https://arxiv.org/html/2410.07283v1 (2024).

  215. Abbasian, M. et al. Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI. npj Digit. Med. 7, 82 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  216. Li, Y. et al. LKAN: LLM-based knowledge-aware attention network for clinical staging of liver cancer. IEEE J. Biomed. Health Inf. 29, 3007–3020 (2025).

    Article  Google Scholar 

  217. Berry, P., Dhanakshirur, R. R. & Khanna, S. Utilizing large language models for gastroenterology research: a conceptual framework. Ther. Adv. Gastroenterol. 18, 17562848251328577 (2025).

    Article  Google Scholar 

  218. Qin, Y., Chang, J., Li, L. & Wu, M. Enhancing gastroenterology with multimodal learning: the role of large language model chatbots in digestive endoscopy. Front. Med. 12, 1583514 (2025).

    Article  Google Scholar 

  219. Berner, E. S. (ed.) Clinical Decision Support Systems: Theory and Practice (Springer, 2016).

  220. Meta AI. Introducing Meta Llama 3: the most capable openly available LLM to date. Meta AI ai.meta.com/blog/meta-llama-3/ (2024).

  221. Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).

    Article  CAS  PubMed  Google Scholar 

  222. Tunstall, L., Von Werra, L. & Wolf, T. Natural Language Processing with Transformers: Building Language Applications with Hugging Face (O’Reilly Media, 2022).

  223. Friedman, C. & Hripcsak, G. Natural language processing and its future in medicine. Acad. Med. 74, 890–895 (1999).

    Article  CAS  PubMed  Google Scholar 

  224. Tunstall, L., Von Werra, L. & Wolf, T. Natural language processing with transformers: building language applications with hugging face. O’Reilly Media https://www.oreilly.com/library/view/natural-language-processing/9781098136789/ (2022).

  225. Chapman, W. W. et al. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J. Am. Med. Inform. Assoc. 18, 540–543 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  226. Nadkarni, P. M., Ohno-Machado, L. & Chapman, W. W. Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18, 544–551 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  227. Leaman, R., Khare, R. & Lu, Z. Challenges in clinical natural language processing for automated disorder normalization. J. Biomed. Inform. 57, 28–37 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

J.N.K. discloses support for the research and publication of this work from the EU’s Horizon Europe Research and Innovation programme (GENIAL, 101096312) and the European Research Council ERC (NADIR, grant number 101114631). J.C. discloses support for the research for this work from the Mildred-Scheel-Postdoktorandenprogramm of the German Cancer Aid (grant number 70115730).

Author information

Authors and Affiliations

Authors

Contributions

All the authors contributed equally to all aspects of the article.

Corresponding author

Correspondence to Jakob Nikolas Kather.

Ethics declarations

Competing interests

J.N.K. declares consulting services for Bioptimus, Owkin, DoMore Diagnostics, Panakeia, AstraZeneca, Mindpeak and MultiplexDx; holds shares in StratifAI and Synagen; has received a research grant from GSK; and has received honoraria from AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer and Fresenius. I.C.W. has received honoraria from AstraZeneca. All other authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Gastroenterology & Hepatology thanks Dennis Shung, who co-reviewed with Sunny Chung; Arsela Prelaj; and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

FDA-approved AI-enabled medical devices: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-enabled-medical-devices

STAT AI Tracker: https://apps.statnews.com/ai-tracker/public/index.html

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wiest, I.C., Bhat, M., Clusmann, J. et al. Large language models for clinical decision support in gastroenterology and hepatology. Nat Rev Gastroenterol Hepatol (2025). https://doi.org/10.1038/s41575-025-01108-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41575-025-01108-1

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing