Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug;12(4):808-815.
doi: 10.1055/s-0041-1735184. Epub 2021 Sep 1.

Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

Affiliations

Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

Lin Lawrence Guo et al. Appl Clin Inform. 2021 Aug.

Abstract

Objective: The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset shifts.

Methods: Studies were included if they were fully published articles that used machine learning and implemented a procedure to mitigate the effects of temporal dataset shift in a clinical setting. We described how dataset shift was measured, the procedures used to preserve model performance, and their effects.

Results: Of 4,457 potentially relevant publications identified, 15 were included. The impact of temporal dataset shift was primarily quantified using changes, usually deterioration, in calibration or discrimination. Calibration deterioration was more common (n = 11) than discrimination deterioration (n = 3). Mitigation strategies were categorized as model level or feature level. Model-level approaches (n = 15) were more common than feature-level approaches (n = 2), with the most common approaches being model refitting (n = 12), probability calibration (n = 7), model updating (n = 6), and model selection (n = 6). In general, all mitigation strategies were successful at preserving calibration but not uniformly successful in preserving discrimination.

Conclusion: There was limited research in preserving the performance of machine learning models in the presence of temporal dataset shift in clinical medicine. Future research could focus on the impact of dataset shift on clinical decision making, benchmark the mitigation strategies on a wider range of datasets and tasks, and identify optimal strategies for specific settings.

PubMed Disclaimer

Conflict of interest statement

None declared.

References

    1. Challener D W, Prokop L J, Abu-Saleh O. The proliferation of reports on clinical scoring systems: issues about uptake and clinical utility. JAMA. 2019;321(24):2405–2406. - PubMed
    1. Rajkomar A, Oren E, Chen K. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1:18. - PMC - PubMed
    1. Harutyunyan H, Khachatrian H, Kale D C, Ver Steeg G, Galstyan A. Multitask learning and benchmarking with clinical time series data. Sci Data. 2019;6(01):96. - PMC - PubMed
    1. Sendak M P, Balu S, Schulman K A. Barriers to Achieving Economies of Scale in Analysis of EHR Data. A Cautionary Tale. Appl Clin Inform. 2017;8(03):826–831. - PMC - PubMed
    1. MI in Healthcare Workshop Working Group . Cutillo C M, Sharma K R, Foschini L, Kundu S, Mackintosh M, Mandl K D. Machine intelligence in healthcare-perspectives on trustworthiness, explainability, usability, and transparency. NPJ Digit Med. 2020;3:47. - PMC - PubMed

Publication types