posted on 2021-01-05, 19:06authored byAaron Y. Lee, Ryan T. Yanagihara, Cecilia S. Lee, Marian Blazes, Hoon C. Jung, Yewlin E. Chee, Michael D. Gencarella, Harry Gee, April Y. Maa, Glenn C. Cockerham, Mary Lynch, Edward J. Boyko
<b>Objective: </b>With rising global prevalence
of diabetic retinopathy (DR), automated DR screening is needed for primary care
settings. Two automated artificial intelligence (AI)-based DR screening algorithms
have FDA approval. Several others are under consideration while in clinical use
in other countries, but their real-world performance has not been evaluated
systematically. We compared the performance of seven<b> </b>automated AI-based DR screening algorithms (including one
FDA-approved algorithm) against human graders when analyzing real-world retinal
imaging data.
<p> </p>
<p><b>Research Design
and Methods: </b>This
was a multicenter, non-interventional device validation study evaluating a
total of 311,604 retinal images from 23,724 veterans who presented for
teleretinal DR screening at the Veterans Affairs (VA) Puget Sound Health Care
System (HCS) or Atlanta VA HCS from 2006 to 2018.<b> </b>Five companies provided seven algorithms, including one with FDA
approval, that independently analyzed all scans, regardless of image quality.
The sensitivity/specificity of each algorithm when classifying images as
referable DR or not were compared to original VA teleretinal grades and a
regraded arbitrated dataset. Value per encounter was estimated.</p>
<p> </p>
<p><b>Results: </b>Although high negative
predictive values (82.72%-93.69%) were observed, sensitivities varied widely
(50.98%-85.90%). Most algorithms performed no better than humans against the
arbitrated dataset, but two achieved higher sensitivities and one yielded
comparable sensitivity (80.47%, p = 0.441) and specificity (81.28%, p = 0.195).
Notably, one had lower sensitivity (74.42%) for proliferative DR (p = 9.77x10<sup>-4</sup>)
than the VA teleretinal graders. Value per encounter varied at $15.14-$18.06
for ophthalmologists and $7.74-$9.24 for optometrists.</p>
<p> </p>
<b>Conclusions</b>: The DR screening algorithms showed significant
performance differences. These results argue for rigorous testing of all such
algorithms on real-world data before clinical implementation.
Funding
This study was supported by NIH/NEI K23EY029246, R01AG060942 and an unrestricted grant from Research to Prevent Blindness. The sponsors / funding organizations had no role in the design or conduct of this research.