Document Type
Article
Publication Title
Open Forum Infectious Diseases
Abstract
Background: People who use drugs (PWUD) often lack access to optimal harm reduction and substance use disorder treatment tools. Tracking the epidemiology of acute care utilization by PWUD is crucial to improving systems of care. Chart reviews and International Classification of Diseases (ICD) codes are the most common systems of identifying hospitalizations of PWUD but are limited by high labor costs and inaccuracy. This study evaluates whether natural language processing (NLP) enhances the sensitivity and specificity of ICD-10 codes in identifying hospitalizations of PWUD.
Methods: We analyzed admissions at Tufts Medical Center between 2018 and 2023. Two NLP tools (Regular Expression and Open Health NLP Toolkit) were developed to identify PWUD and were compared with ICD-10 algorithms. The NLP and ICD-10 algorithms were applied to all admissions, and demographic and hospitalization-related data were extracted. The research team manually reviewed notes written during 790 hospitalizations of PWUD as the gold standard. We calculated sensitivity, specificity, and net reclassification indices.
Results: ICD-10 codes alone demonstrated low sensitivity (43%) but high specificity (99%). Adding NLP systems improved sensitivity up to 94%, though specificity decreased to 46%. Threshold adjustments (eg, notes flagged ≥50%) revealed a trade-off between sensitivity (47%) and specificity (96%). The most practical model-Regular Expression or ICD-10 codes-resulted in a sensitivity of 74% and specificity of 87%.
Conclusions: NLP is an innovative tool that can create functional, cost-effective, and accurate systems of identifying hospitalized PWUD. These findings support further development of NLP technologies to improve health care equity for PWUD.
DOI
10.1093/ofid/ofaf370
Publication Date
6-23-2025
Keywords
harm reduction, infectious diseases, injection drug use, natural language processing, people who use drugs
ISSN
2328-8957
Recommended Citation
Benrubi L, Sato T, Westgard LK, Zollo-Venecek K, Socrates B, Sweigart B, Ridgway JP, Suzuki J, Morales Y, Goodman-Meza D, Wurcel AG. Sensitivity and Specificity of Natural Language Processing Systems for Identification of Hospitalized People Who Use Drugs. Open Forum Infectious Diseases. 2025; 12(7). doi: 10.1093/ofid/ofaf370.