American Diabetes Association
Browse
- No file added yet -

The Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Antidiabetic Medication Under Conditions of Clinical Uncertainty

Download (311.64 kB)
figure
posted on 2024-09-09, 15:26 authored by James H. Flory, Jessica S. Ancker, Scott Y. H. Kim, Gilad Kuperman, Aleksandr Petrov, Andrew Vickers

Objective: To explore how the commercially available large language model (LLM) GPT-4 compares to endocrinologists when addressing medical questions when there is uncertainty regarding the best answer.

Research Design and Methods: This study compared responses from GPT-4 to responses from 31 endocrinologists using hypothetical clinical vignettes focused on diabetes, specifically examining the prescription of metformin versus alternative treatments. The primary outcome was the choice between metformin and other treatments.

Results: With a simple prompt, GPT-4 chose metformin in 12% (95% CI 7.9%–17%) of responses, compared to 31% (95% CI 23%–39%) of endocrinologist responses. After modifying the prompt to encourage metformin use, the selection of metformin by GPT-4 increased to 25% (95% CI 22%–28%). GPT-4 rarely selected metformin in patients with impaired kidney function, or a history of gastrointestinal distress (2.9% of responses, 95% CI 1.4%–5.5%). In contrast, endocrinologists often prescribed metformin even in patients with a history of gastrointestinal distress (21% of responses, 95% CI 12%–36%). GPT-4 responses showed low variability on repeated runs except at intermediate levels of kidney function.

Conclusions: In clinical scenarios with no single ‘right’ answer, GPT-4’s responses were reasonable, but differed from endocrinologists’ responses in clinically important ways. Value judgments are needed to determine when these differences should be addressed by adjusting the model. We recommend against reliance on LLM output until It is shown to align not just with clinical guidelines but also with patient and clinician preferences, or it demonstrates improvement in clinical outcomes over standard of care.

Funding

This work was supported in part by the National Institutes of Health/National Cancer Institute (NIH/NCI) with a Cancer Center Support Grant to Memorial Sloan Kettering Cancer Center [P30 CA008748], and by the Patient Centered Outcomes Research Institute [ME-2022C1-26378 and CER-2017C3-9230].

History

Usage metrics

    Diabetes Care

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC