Document Type

Article

Publication Title

Intelligence-Based Medicine

Abstract

Overlapping clinical symptoms between people with multiple sclerosis (PwMS) and those with neuromyelitis optica spectrum disorder (PwNMOSD) can result in misdiagnosis. Large language models, such as ChatGPT, offer accessible tools for preliminary health guidance. We assessed the accuracy of open-access (GPT-3.5) and subscription-based (GPT-4) models in diagnosing MS and NMOSD, and the influences of key diagnostic inflection points (initial MRI findings and aquaporin-4 (AQP4) antibody testing) and subject demographics on model performance. PwMS and PwNMOSD were retrospectively identified within a single academic center, and structured clinical timelines were processed through GPT-3.5 and GPT-4. Seven digital derivatives per subject, varying race, ethnicity, and sex, were also created to assess demographic influences. ChatGPT provided one diagnosis after each timepoint, and diagnostic accuracy was determined using mixed-effects logistic regression. A total of 98 PwMS and 157 PwNMOSD were included, generating 4080 ChatGPT conversations across models and digital derivatives. GPT-4 demonstrated higher diagnostic accuracy for MS (OR=2.67) and NMOSD (OR=1.31), relative to GPT-3.5. Accuracy improved as the clinical time line progressed, although GPT-4 paradoxically performed worse after the initial MRI report for MS cases (OR=0.56). For PwMS, diagnostic accuracy was lower in males (OR=0.81) and older individuals (OR=0.56 per 10-year age increase). Conversely, accuracy was higher for African Americans (OR=1.30) and Asians (OR=1.38) for PwNMOSD. GPT-4 demonstrated higher diagnostic accuracy for both diseases, but superior performance was not uniform across demographic groups. Further, the paradoxical decline in accuracy after MRI interpretation in MS cases suggests context-dependent performance, and responsible interpretation remains necessary.

DOI

10.1016/j.ibmed.2025.100314

Publication Date

11-14-2025

Keywords

Multiple sclerosis, Neuromyelitis optica spectrum disorder, ChatGPT

ISSN

2666-5212

Share

COinS