Potential for LLMs to Aid Clinicians by Drafting Discharge Summary Narratives
Background: Large language models (LLMs) are being explored for clinical documentation, yet their ability to generate safe, accurate discharge summaries is uncertain. Objective: To compare the quality and safety of ChatGPT-4 discharge summaries with those written by hospitalists. Design: Cross-sectional observational study. Methods: Adult hospital–medicine encounters at University of California San Francisco (2019 to 2022) with length of stay 3 to 6 days and live discharge were screened. A total of 100 randomly selected cases met the inclusion criteria. Standardized prompts containing daily progress notes were fed to ChatGPT-4. Blinded reviewers rated LLM-generated and clinician-generated summaries for errors (inaccuracies, hallucinations, omissions), comprehensiveness, coherence, conciseness, harmfulness, and overall preference. Results: LLM summaries had more errors than physician summaries (mean, 2.9 vs 1.8; P Coherence and conciseness were similar, but physician notes were
more...
Want to read the full article?
To view, you must be an active Practical Reviews subscriber.