Researchers identified new machine learning methods to enable the automated surveillance of electronic heath records in a new research paper in npj Digital Medicine last week.
The Stanford University researchers show the system extracted the vast majority of reports of complications and pain in a sample of EHRs for patients who had undergone hip replacements.
- The publication of the paper comes shortly after FDA made the expansion of its postmarket safety program to EHRs a focal point of its strategy for the next five years.
EHRs are a potentially valuable sources of real-world evidence. However, some of the information of interest is contained in clinical notes, an unstructured source of data difficult for machines to analyze. Deep learning methods have shown promise but primarily when built on large, expensive, hand-labeled training sets.
The npj Digital Medicine paper describes the creation of a deep learning method designed to identify patient outcomes in clinical notes without the support of hand-labeled training sets. The researchers used the "weak supervision" approach developed to eliminate the labeled training data bottleneck that is affecting the whole machine learning field.
That approach generates "large amounts of imperfectly labeled training data," potentially resulting in better results than hand labeling for less manual effort. The paper provides data to validate that idea.
"By focusing on creating labeling functions, instead of manually labeling training data, we achieved state-of-the-art performance with reusable code that, unlike labeled data, can be easily updated and shared across different healthcare systems," the authors wrote.
Applied to EHRs from 6,583 hip replacement patients, the automated method extracted reports of complications and pain with up to 96.3% precision and 98.5% recall. Analyzing the clinical notes resulted in the detection of six times more complication events than when the assessment only used structured data. The researchers predicted they would capture more events than current systems, too.
"The ability to quantify pain and complication rates automatically over a large patient population offers an advantage over surveillance systems that rely on individual reports from patients or surgeons," the authors wrote.
One limitation of the study is that the analysis only covered patients treated at Stanford. However, that shortcoming may only be a temporary one. The study authors have made their code publicly available so it could be applied to data held at other institutions.