Physician- and Large Language Model-Generated Hospital Discharge Summaries.|Articles

Potential for LLMs to Aid Clinicians by Drafting Discharge Summary Narratives

Background: Large language models (LLMs) are being explored for clinical documentation, yet their ability to generate safe, accurate discharge summaries is uncertain. Objective: To compare the quality and safety of ChatGPT-4 discharge summaries with those written by hospitalists. Design: Cross-sectional observational study. Methods: Adult hospital–medicine encounters at University of California San Francisco (2019 to 2022) with length of stay 3 to 6 days and live discharge were screened. A total of 100 randomly selected cases met the inclusion criteria. Standardized prompts containing daily progress notes were fed to ChatGPT-4. Blinded reviewers rated LLM-generated and clinician-generated summaries for errors (inaccuracies, hallucinations, omissions), comprehensiveness, coherence, conciseness, harmfulness, and overall preference. Results: LLM summaries had more errors than physician summaries (mean, 2.9 vs 1.8; P Coherence and conciseness were similar, but physician notes were more...

Want to read the full article?

To view, you must be an active Practical Reviews subscriber.

Login

Potential for LLMs to Aid Clinicians by Drafting Discharge Summary Narratives

Want to read the full article?