Current EHR and HIT thinking places significant value on immediate and downstream use of EHR data. The expected benefits of interoperability, clinical decision support, and data analytics all depend on accurate EHR data. Yet, somehow, data quality has not gotten the attention that it should. While clinical researchers are increasingly focused on improving phenotyping algorithms for EHR data extractions(1), there is much less focus on how EHR data collection and validation practices can improve data quality.
Data validation can occur in a number of ways. Basic validation techniques (e.g., missing data, spelling checks, correct formatting) are easy to do, and are simply good software engineering. For clinical systems, the next level up is range-checking for standard data elements. At this level, unreasonable values, such as temperatures of 200o F or blood pressures of 1200/80, are prevented. The highest and hardest level of checking for EHR data is that of “truth” – that is, assuring that the information in the chart that makes it past the first two levels of validation is factually correct. Diagnosis accuracy – the correspondence between the coded diagnosis (ICD or SNOMED) and the remaining chart data–is an example of the challenges inherent in assuring accurate EHR data. In their paper, Nathan, et al. (2) address this discrepancy in an excellent analysis that compares chart review findings to case-finding algorithms.
Noting that data quality issues exist in EHR systems, they set out to quantify the level of inaccuracy. They state:
Previously identified challenges affecting data quality include missing data, variation in terminology and misclassification of coding as well as significant variation between diseases . However, the rate at which these occur and the degree to which each factor contributes to overall data quality and validity has not been reported.
Chart reviews were used to assess case-finding algorithms as follows.
We conducted a retrospective chart review of EMR data from one PBRN, the Manitoba Primary Care Research Network (MaPCReN), to characterize the various factors contributing to the validity of diagnoses in the database. Validation of the CPCSSN 2011 diagnostic algorithms was performed using the chart review as the gold standard. Chart review was selected as it has the advantages of ensuring completeness and does not depend on the cooperation of physicians or office staff and is used widely in research therefore fulfilling the fitness for use criteria.
As might be expected, charts reviews found diagnoses that the algorithms did not.
Among the 403 charts sampled, there were a total of 249 non-matching diagnoses of all five conditions. Of these, 82 were diagnoses that were not identified by chart review but were inaccurately detected by case finding algorithms and 167 were diagnoses that were identified by chart review but undetected by algorithms.
These findings hold important cautions for everyday clinical care, CDS reliability, and they also point to potential interoperability mishaps.
From the CDS perspective, case finding algorithms found too many non-cases while missing real ones. CDS algorithms that rely on potentially inaccurate problem list contents could either raise many useless alerts for non-existent problems or overlook problems that require interventions.
Interoperability actually passes data quality issues on to someone else. Even worse, downstream users of poor quality data may not have access to sufficient information to spot or rectify the inaccuracies. Clinicians who have access to the original chart might be able to piece together the truth. Downstream clinicians who do not have access to the full original are stuck. What happens when recipients of bad data sets eventually pass them on? The discrepancies become harder to rectify with each new exchange.
Knowing that data quality issues exist, what can be done about them? The authors state that most errors grow out of clinician/practice factors.
Practice based factors have been found to contribute the largest variation in data quality In our study, 45% of all diagnoses missed by the algorithms were due to diagnoses being recorded solely in the free text fields and without use of an ICD-9 code or listing in the problem list. The variation between practices is evident in the example of COPD, where the largest source of discrepancy of inappropriate ICD-9 codes occurred at a single site. Physician ICD-9 coding accuracy has been previously shown to be adversely affected by higher workload, clinician uncertainty, patient complexity, and the possible stigma associated with certain conditions.
The authors used data from a primary care-based research network, so one might conclude that primary care practices, often being resource challenged, might be more susceptible to error introduction than larger entities. However, data quality research does not support this conclusion (see Wrestling with EHR Data Quality). Anecdotally, one does not have to look very hard to find examples of inaccurate diagnoses. Mr. HIStalk relates that after viewing his records from an EMRAM Stage 7 hospital that had successfully passed MU Stage 2, he found several incorrect entries (and not minor ones – MI, asthma, diabetes, yikes!!!).
Clinicians and practice factors are a constant – humans deliver care. The key question is: How can software designs address such a pervasive problem? A good first step would be recognizing that diagnoses can never be reliably accurate when controlled terminologies are the only validation mechanism applied. Controlled terminologies do solve spelling problems and assure the use of approved terms. However, such selections are only “beliefs” unless/until they are validated against additional chart information. Adding the equivalent of case-finding and similar algorithms as an integral part of validation processes could improve accuracy.
Many diagnoses missing from problem lists are embedded in narratives in free-text fields or in various reports linked to the chart. These items are not readily analyzed or searched, but such limitations are not insurmountable. One approach to solving this problem is the use of natural language processing tools. NLP is a worthwhile endeavor, but a simpler approach seems possible. Why not try a markup language for narrative content that renders to viewers as free text, but is searchable as data. A markup language would be a good fit for form-based input and could be incorporated into speech recognition tools.
Inaccurate data have an immediate, negative impact on care quality and lost productivity when clinicians have to validate chart contents in the middle of patient encounters (something I haven’t experienced in more than 10 years, but still dread).
In the national dialog on EHR systems, so much effort and mindshare has been granted to CDS, interoperability, and analytics that one would think data quality to be only a minor issue. Perhaps, the lack of attention is because data quality is a hard problem. Even so, it must be addressed. Since every vendor has a proprietary data schema, every vendor is reinventing the wheel for data quality management, which does not bode well for global improvement in the current state of things.
CDS, interoperability, and data analytics are important goals, but inaccurate data could actually make them more dangerous than helpful. Garbage in, garbage out is still a bad thing–right?
- Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL, Choudhary V, Basford M, Chute CG, Kullo IJ, Li R, Pacheco JA, Rasmussen LV, Spangler L, Denny JC.hhhhhh Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc. 2013. Jun;20(e1):e147-54.
- Coleman N, Halas G, Peeler W, Casaclang N, Williamson T, Katz A. From patient care to research: a validation study examining the factors contributing to data quality in a primary care electronic medical record database. BMC Fam Pract. 2015 Feb 5;16:11.