A new study suggests that artificial intelligence (AI) could open up new avenues for sepsis research, researchers reported late last week in JAMA Network Open.
The study, conducted by researchers and clinicians from Harvard Medical School, Massachusetts General Hospital, and Brigham and Women's Hospital, found that a large-language model (LLM) was able to extract presenting signs and symptoms of sepsis from the admission notes of more than 93,000 patients with accuracy that was equal to that of physicians performing a manual medical review. The LLM also identified symptom-based syndromes that correlated with infection sources, risk for antibiotic-resistant organisms, and in-hospital mortality.
The authors of the study say the findings demonstrate the feasibility of using LLMs to extract complex data from unstructured clinical text at a speed and scale not previously possible.
Extracting structured data from clinical notes
More than 1.7 million Americans are treated each year for sepsis, which occurs when the immune system overreacts to an infection, triggering a chain of events that can lead to tissue damage, organ failure, death. And because treatment guidelines and quality metrics recommend quick initiation of antibiotics for patients with suspected sepsis, it's a significant driver of broad-spectrum antibiotics.
The authors note that while clinicians often rely on signs and symptoms to make their initial antibiotic treatment decisions, most studies assessing the association between antibiotic choice, timing, and outcomes haven't used signs and symptoms as variables because extracting that information requires "laborious and subjective medical reviews."
The development of AI tools that can rapidly extract and evaluate large swaths of data provided the opportunity to test whether such a tool could be used to help improve strategies for optimal sepsis detection and treatment.
"Large language models (LLMs) can adeptly manipulate unstructured text with minimal pretraining and offer a promising new strategy to extract structured data, such as presenting signs and symptoms, from clinical notes at scale," the authors wrote. "Our goal was to develop and validate a scalable approach to capturing symptom data that serves as the groundwork for improved predictive models of pathogen categories, antimicrobial resistance profiles, patient prognoses, and tailored empiric antibiotic prescribing for sepsis."
For the study, the researchers used an LLM they developed to extract up to 10 presenting signs and symptoms from the history-and-physical admission notes of 104,248 patients with possible infection who were treated at five Massachusetts hospitals from June 2025 through August 2022. They then validated the LLM labels by comparing the results with a manual review of a random sample of 303 admission notes by an infectious disease physician, who used the same instructions as the LLM prompt.
Signs, symptoms, and superbugs
The researchers also examined the associations between presenting signs and symptoms and isolation of methicillin-resistant Staphylococcus aureus (MRSA) from a clinical culture within 72 hours of emergency department (ED) arrival, isolation of a multidrug-resistant gram-negative (MDRGN) organism, and in-hospital mortality.
Among the 104,248 patients included in the study (24.9% and 22.7% of whom had septic shock and sepsis without shock, respectively), the LLM labeled the notes of 93,674 patients. The LLM achieved an accuracy of 99.3%, balanced accuracy of 84.6%, positive predictive value of 68.4%, sensitivity of 69.7%, and specificity of 99.6% compared with the physician medical record reviewer.
Large language models (LLMs) can adeptly manipulate unstructured text with minimal pretraining and offer a promising new strategy to extract structured data, such as presenting signs and symptoms, from clinical notes at scale.
Analysis of the 30 most common sepsis signs and symptoms identified by the LLM produced seven syndromes corresponding to four sites of infection (skin and other soft tissue, cardiopulmonary, gastrointestinal, and urinary tract) that were directly correlated with ECD-10-CM discharge diagnosis codes that corresponded to infections at those sites. The presence of skin and other soft-tissue symptoms (adjusted odds ratio [AOR], 1.73) was directly associated with MRSA culture positivity, while the absence of gastrointestinal (AOR, 0.63) or urinary tract symptoms (AOR, 0.34) were inversely associated with MRSA culture positivity.
In contrast, urinary tract (AOR, 1.26) and gastrointestinal symptoms (AOR, 1.14) were directly associated with MDRGN organisms, while skin and other soft-tissue symptoms (AOR, 0.85) were inversely associated. Cardiopulmonary symptoms were associated with increased mortality (AOR, 1.30).
"By enabling new population-scale analyses of clinically significant patient-level details in clinical notes, including symptoms, temporality, outside hospital courses, and other health care exposures, LLMs can advance the scope and quality of clinical epidemiologic research," the authors wrote.
They add that further research is warranted to evaluate the value of large-scale sign-and-symptom data in models of antibiotic choice, effectiveness, and outcomes in sepsis patients.
Clinical value unclear
The clinical value of the LLM is less clear, however.
In a commentary published in the same journal, Jonathan Baghdadi, MD, PhD, of the University of Maryland School of Medicine, and Cristina Vazquez-Guillamet, MD, of the Washington University in St. Louis School of Medicine, say that although a tool that discerns the signs and symptoms of sepsis "could be useful for understanding optimal approaches to early sepsis care," the LLM validated in the study is, at the moment, probably "better suited to automating simple tasks, such as the extraction of signs and symptoms, than participating in clinical decision-making."
Clinicians and researchers want to believe that AI will reveal deep truths that would be otherwise imperceptible to the human mind, but a tool that summarizes and distills a nuanced patient narrative into a set of 5 to 10 symptoms is strictly focused on the surface level.
But they add that as the technology improves, use of LLMs for clinical decision support is potentially within sight. "Based on tools currently available or in development, a nonhuman entity could guide or automate history taking, generation of a differential diagnosis, transcription of the encounter in a note, and subsequent clinical decision-making," they wrote.
And that could create problems.
"At every step, we anticipate that AI could have a flattening effect, making patients and their stories appear more homogeneous than they really are," they wrote. "Clinicians and researchers want to believe that AI will reveal deep truths that would be otherwise imperceptible to the human mind, but a tool that summarizes and distills a nuanced patient narrative into a set of 5 to 10 symptoms is strictly focused on the surface level."