Document Type
Article
Publication Title
PEC Innovation
Abstract
Objective
Large language models (LLMs) are increasingly applied in medicine, but their role in peri-operative education is underexplored. This pilot feasibility study compared four LLMs in producing post-operative care instructions for total knee arthroplasty (TKA).
Methods
OpenAI GPT-4o, Claude 3.7 Sonnet, DeepSeek R1, and Gemini 2.0 Flash generated instructions from a standardized prompt. Outputs were scored (0 = does not meet, 1 = partially meets, 2 = fully meets) for accuracy, clarity, relevance, consistency, and readability. Accuracy was benchmarked against ERAS, ASA guidelines, and UpToDate. Readability was assessed using Flesch-Kincaid indices.
Results
Within this limited sample, Claude, GPT-4o, and DeepSeek R1 demonstrated higher observed accuracy than Gemini, with Claude and GPT-4o showing full alignment with reference standards. Clarity scores were comparable across models. All achieved high relevance and internal consistency. Readability varied, with Gemini generating less readable text and GPT-4o and DeepSeek R1 producing more accessible content.
Conclusion
LLMs can generate accurate, relevant, and consistent instructions, supporting their potential use in anesthesia education. Attention to readability and plain-language prompting may further enhance clinical utility.
Innovation
This study provides one of the first anesthesia-specific evaluations of multiple LLMs, showing feasibility and opportunities for AI-driven patient communication.
DOI
10.1016/j.pecinn.2025.100444
Publication Date
6-2026
Keywords
Large language model, Anesthesia, Post-operative, Instructions, Innovation, Arthroplasty, Patient education technology
ISSN
2772-6282
Recommended Citation
Nagesh D, Keating III D, Divakaruni RV, Beutel BG. Feasibility Evaluation of Large Language Models in Anesthesia-specific Post-operative Care Instructions for Total Knee Arthroplasty. PEC Innovation. 2026; 8. doi: 10.1016/j.pecinn.2025.100444.
