Table 1 Hierarchical model of efficacy to assess the contribution of AI software to the diagnostic imaging process, adapted from Fryback and Thornbury (1991) [13]
Level | Explanation | Typical measures |
---|---|---|
Level 1t | Technical efficacy Article demonstrates the technical feasibility of the software | Reproducibility, inter-software agreement, error rate |
Level 1c | Potential clinical efficacy Article demonstrates the feasibility of the software to be clinically applied | Correlation to alternative methods, potential predictive value, biomarker studies |
Level 2 | Diagnostic accuracy efficacy Article demonstrates the stand-alone performance of the software | Standalone sensitivity, specificity, area under the ROC curve, or Dice score |
Level 3 | Diagnostic thinking efficacy Article demonstrates the added value to the diagnosis | Radiologist performance with/without AI, change in radiological judgement |
Level 4 | Therapeutic efficacy Article demonstrates the impact of the software on the patient management decisions | Effect on treatment or follow-up examinations |
Level 5 | Patient outcome efficacy Article demonstrates the impact of the software on patient outcomes | Effect on quality of life, morbidity, or survival |
Level 6 | Societal efficacy Article demonstrates the impact of the software on society by performing an economic analysis | Effect on costs and quality-adjusted life years, incremental costs per quality-adjusted life year |