Real-World Prospective Validation and Economic Evaluation of Deep Learning-based Diabetic Retinopathy Detection from Fundus Photographs: A Systematic Review and Meta-Analysis
posted on 2025-11-20, 13:17authored byAn Ran Ran, Jennifer Li Ding, Ziqi Tang, Ching Lam, Truong X. Nguyen, Jiaying Zhou, Shuyi Zhang, Danqi Fang, Dawei Yang, Vincent Ng, Duoru Lin, Haotian Lin, Clement C. Tham, Carmen KM Chan, Simon K.H. Szeto, Tien Y. Wong, Sobha Sivaprasad, Carol Y. Cheung
<p dir="ltr">Background</p><p dir="ltr">Deep learning (DL) has shown promise in delivering diagnostic and economic benefits for detecting diabetic retinopathy (DR) from fundus photographs (FP). However, evidence synthesis of model validation in prospective, real-world settings remains limited. </p><p><br></p><p dir="ltr">Purpose</p><p dir="ltr">To assess the feasibility of implementing DL-DR systems using FPs across different countries by synthesizing prospective validation and economic evidence.</p><p> </p><p dir="ltr">Data Sources</p><p dir="ltr">We searched five databases until August 13th, 2025.</p><p><br></p><p dir="ltr">Study Selection</p><p dir="ltr">Studies prospectively assessing performance, and/or studies conducting economic analyses of DL-DR systems using FPs. </p><p><br></p><p dir="ltr">Data Extraction</p><p dir="ltr">Characteristics of all studies, performance parameters of prospective validation studies, and economic outcomes of economic analysis studies were extracted. </p><p><br></p><p dir="ltr">Data Synthesis</p><p dir="ltr">Forty-seven studies were included in the meta-analysis. The pooled performance was the highest in detecting vision-threatening DR (AUROC 0.974), followed by any DR (AUROC 0.965), then referable DR (RDR) (AUROC 0.959). Study region, clinical pathway, mydriasis, image quality control, sample size, grading criteria, reference standard, and model architecture significantly affected model performance in RDR detection. Fifteen studies were included in the economic commentary, showing that DL-based DR screening was cost-effective in high-income countries, while results in middle-income countries were mixed, depending on compliance rates, glycemic control, and initial costs.</p><p><br></p><p dir="ltr">Limitations</p><p dir="ltr">A paucity of studies assessing multiple severities of DR or DME restricted our ability to perform subgroup analyses. Insights into low-income countries were limited by a lack of studies in these regions.</p><p><br></p><p dir="ltr">Conclusions</p><p dir="ltr">DL-DR systems using FPs had high discriminative performance in prospective real-world settings and hold promise to improve the cost-effectiveness, especially in high-income countries.</p><p><br></p><p><br></p>
Funding
This study was funded by Research Grants Council - General Research Fund, Hong Kong (Ref: 14101324)