A new artificial intelligence (AI) model shows high accuracy for diagnosing pediatric conditions compared with initial diagnosis by an examining physician.
The model was also able to distinguish between common, less urgent conditions and potentially life-threatening ones, according to a report by Huiying Liang, Guangzhou Women and Children’s Medical Center at Guangzhou Medical University in China, and colleagues.
The study, published online today in Nature Medicine, shows this AI framework can mimic the clinical reasoning of human physicians and use machine learning to pull clinically relevant text from electronic health records (EHRs) to accurately predict a patient’s diagnosis, Liang and colleagues conclude.
Rod Tarrago, MD, chief medical information officer and medical director of medication safety at Seattle Children’s Hospital in Washington, told Medscape Medical News he sees this model as an exciting step forward. But, he said, this would be a tool that can assist physicians rather than eventually replace them.
He said it’s heartening to see such a model in pediatrics, which has fewer patients and has more trouble populating clinical trials, and could benefit from machine-learning models trained on large data sets. This model also differs from some previous ones in that it relies on text rather than imaging to make diagnoses, he notes.
The system uses automated natural language processing and one of the authors, Kang Zhang, MD, PhD, told Medscape Medical News the model can be adapted to any language. Zhang is affiliated with both the Guangzhou Women and Children’s Medical Center and the University of California, San Diego.
The researchers compared the AI system’s and physicians’ ability to diagnose a list of conditions such as asthma, encephalitis, sinusitis, and pneumonia. Physicians manually graded 11,926 EHRs from an independent cohort of pediatric patients.
Twenty physicians were grouped by experience and proficiency into two junior and three senior groups. A physician in each group read a subset of the clinical notes from the data and made a diagnosis for each patient.
Researchers assigned an F1 score, which is a measure of precision and recall, for performance and found that the AI system outperformed the two junior groups but had a slightly lower score than the three senior physician groups.
Specifically, the average F1 score for the AI model was 0.885. The F1 scores for the junior physician groups averaged 0.841 and 0.839, and the F1 score for the senior physician groups averaged 0.907, 0.915, and 0.923.
“This result suggests that this AI model may potentially assist junior physicians in diagnoses but may not necessarily outperform experienced physicians,” the authors write.
The authors also highlight the system’s accuracy in diagnosing dangerous conditions.
“Our system was able to achieve this in several disease categories, as illustrated by its performance for acute asthma exacerbations (0.97), bacterial meningitis (0.93), and across multiple diagnoses related to systemic generalized conditions, such as varicella (0.93), influenza (0.94), mononucleosis (0.90), and roseola (0.93). These are all conditions that can have potentially serious and sometimes life-threatening sequelae, so accurate diagnosis is of utmost importance,” they write.
The authors see several uses for the framework in clinical practice. When patients enter an emergency department or urgent care center, for instance, the algorithm, using basic information, vital signs, and physical exam notes, could prioritize which patients need to see a doctor first. This could cut wait times and improve access to care.
Another application for the AI system is in helping diagnose complex or rare conditions. The system was “trained” using 101.6 million data points from 1.4 million pediatric patient visits from January 2016 to July 2017 to a major referral center in Guangzhou, China. The size of the dataset may help eliminate bias of physicians diagnosing what they have seen in their own experience.
Tarrago said the diagnoses of the more complex conditions that are less consistent in presentation will be the real test of the model.
“The possibilities are certainly there,” he said, but added that the diagnoses in this study are not necessarily the areas where human doctors struggle significantly.
The authors conclude that although this system may have the largest impact in countries such as China where the ratio of healthcare providers to population is low, “the benefits of such a system are likely to be universal.”
“I think this is a great step in pediatrics and in artificial intelligence,” Tarrago said. “They have the benefit of having such a huge sample size. Now the question is how does this work across different types of healthcare models? This was focused more on ambulatory settings.”
Tarrago said he would like to see models like this eventually become a parallel partnership with the physician and not just a last resort when the physician is having trouble diagnosing. He would like to see interaction in real time where the physician could “discuss” with the AI system reasons for making a diagnosis.
In such a scenario, he said, “Not only does it learn from us, but we continue to learn from it.”
This study was funded by the National Key Research and Development Program of China, National Natural Science Foundation of China, Guangzhou Women and Children’s Medical Center, Guangzhou Regenerative Medicine and Health Guangdong Laboratory. The study authors and Tarrago have disclosed no relevant financial relationships.
Nat Med. Published online February 11, 2019. Abstract