Unstructured Data in Healthcare
By Hon S. Pak, MD MBA, Chief Medical Officer, 3M HIS
Nearly 80 percent of clinical information in electronic health records (EHRs) is “unstructured” and in a format that health information technology systems cannot use. As a result, unstructured information is either ignored or when feasible, converted into a precise “structured” format to make it accessible and available for analysis. When structured, however, the information is relatively limited, preventing physicians from capturing the nuances of care.
Today, unstructured data is still largely untapped. Most healthcare organizations use manual processes to extract needed information from unstructured data in the EHR, primarily for purposes such as registries, quality reporting, chronic disease management, documentation review, and for some research applications. At a time when healthcare organizations face increasing economic pressure, this manual extraction effort is both time-consuming and resource intensive. Increased reporting requirements associated with MIPS/MACRA and the shift to value-based care have multiplied the time spent manually extracting information from unstructured data, resulting in a growing administrative burden for frontline caregivers.
What if we could tap the full depth and breadth of clinical information residing in the EHR to provide meaningful context and nuanced clinical details of the patient’s condition? What if this data was available for a broad range of applications, from clinical patient management and more efficient health resource utilization within a community to large-scale population health initiatives? What if we could automatically and in real-time access the clinical information hidden in unstructured data to advance clinical trials, quickly complete prior authorizations, enable better care coordination/ management, automate registries, quality reporting and chart audits, speed claims adjudication/review, and enhance clinical decision support?
The promise of NLP
Finding answers to these questions is more essential than ever as we attempt to measure value and improve patient outcomes in the shift to precision medicine and value-based care. One possible answer is natural language processing (NLP), a technology that converts unstructured data into structured codes, making the data accessible and actionable. NLP works with the most valuable form of clinical communication: the clinical narrative. By processing unstructured text directly with computer applications, NLP leverages the wealth of available patient information in the EHR to improve communication between caregivers, reduce the cost of working with clinical documentation, and automate repetitive data capture and reporting requirements. Until recently, its use has been limited to specific niche use cases or academic research with little application at enterprise scale. In our work with more than 3,000 client sites, however, we can attest to the promise of NLP as the technology is already demonstrating its potential in freeing physicians to focus on patient care rather than requiring them to change their existing, proven processes to accommodate technology.
"Today, patients are the most under-utilized resource in healthcare."
3M™ currently supports client sites with an NLP platform that processes more than 2.5 million clinical documents daily in the completion of computer-assisted diagnostic/ procedural coding and clinical documentation improvement. The next chapter will see this NLP platform expand into mining for clinical concepts through standard terminologies such as SNOMED CT and by leveraging clinical knowledge representations within the 3M™ Healthcare Data Dictionary— all combined with data science advancements and a robust architecture in the cloud. This expansion of the NLP platform will allow new applications such as data abstraction for registry submission. Applying NLP to the registry abstraction process is a complex undertaking, as it must align or improve upon the current workflow and at the same time, provide assurances that output of the NLP engine is correct. Ultimately, the goal is to streamline data collection, reduce time spent data mining and allow more time for data interpretation, which translates into targeted improvements in patient care.
Addressing incomplete EHR data
As a physician and a former CIO, I am humbled by the statistic that clinical care accounts for only 20 percent of patient outcomes. Other influential factors include determinants of health such as socio-economic variables, behavioral factors and genetics. Except for genetic data, which tends to be structured, data that contribute most significantly to patient outcomes is uncollected or unstructured and infrequently used in clinical care today.
Social and behavioral determinants of health such as smoking status or depression are significant factors attributed to risk and functional outcomes. Despite their importance, EHRs do not capture this patient information. A select committee of leading researchers and population health experts, funded by the Institute of Medicine (IOM), was recently tasked to identify these core domains to facilitate future use of EHR data for both care delivery and clinical research. A report of the initiative’s first phase, Capturing Social and Behavioral Domains in Electronic Health Records: Phase 1, identified criteria for the selection of domains with high priority for EHR inclusion. Ultimately, our ability to apply new technologies such as NLP requires a much broader set of data to be available in the EHR (claims, clinical, patient reported outcomes, determinants of health, genetics). Only then will we have a complete, holistic view of the patient and their environment—critical information for outcomes improvement.
Disruption is coming
The challenge of unstructured data offers an opportunity for new thinking. Rather than respond to ever-expanding data requirements such as MIPS/MACRA with more structured data entry and EHR pick lists for physicians, we should look for new approaches. Today, patients are the most under-utilized resource in healthcare, and personal technologies such as Amazon Alexa or Google Home may allow patients to actively participate not only in their care but in how information is captured. In the next 10 years, we will see real-time engagement where AI begins to interact with both physicians and patients to enable the accurate capture not only of basic concepts, but the context and insights necessary to provide a holistic understanding of the patient.
These are still early days in terms of research and development, but NLP, speech recognition, and Artificial Intelligence (AI) are all technologies worth watching. New technologies like NLP may not be the complete answer, but they can simplify the process of information capture and eliminate the need for labor-intensive structured data entry, thereby reducing friction and frustration of the healthcare workforce.