Missing data values are always a pain in the neck. Any measure of data quality and completeness has to contend with missing values. In production systems such as EHRs that are used during the care process, missing values often occur more frequently than in research databases where quality checks are routinely performed (at least this has been my experience). In fact, my first up-close-and-personal encounter with missing data values occurred as part of the CNICS project nearly 12 years ago.
Initial attempts at pooling data from five EHR systems proved to be problematic for all of the usual reasons (missing values, terminology, data schema, etc.). When we looked at missing values, we discovered some had been overlooked or otherwise not collected, but were obtainable with additional effort (e.g. re-interviewing a patient). However, others did not exist because they were never collected at the originating site. In order to avoid searching for or trying to obtain data that were never likely to be available, we created a system of annotating missing values so that it was easy to tell why they were missing and whether they might ever be available. I came to refer to the annotations we used as null codes.
Everyone has made use of some type of annotation for missing data values, the ubiquitous “not applicable,” being an old standby. While reading one of Thomas Beale’s interesting posts on openEHR, I discovered that HL7 V3 has a null flavor concept. The table below, taken from the CDC PHIN website, lists the Null Flavors included in HL7 V3.
|Concept Code||Concept Name|
|ASKU||asked but unknown|
While these are undoubtedly useful, they are not very extensive or expressive.
Annotating missing data values in EHRs is a good idea. Depending on the specificity of the annotation, one is able to gather information that might improve future data collection efforts. For example, data values that are missing because a specimen is awaiting analysis (pending) represents a different data quality problem than values that are absent because they have been deleted or were never collected. Further, test results missing because they are pending while awaiting specimen collection are different from those that are pending and the specimen has been sent to the lab. By specifying why a result is pending, missing data values can be used to uncover process inefficiencies or irregularities.
Patient attitudes and inclinations may also be captured using annotations. For example, in capturing drug or sexual histories, in addition to “patient refused,” other reasons might be recorded to explain the missing information such as “patient embarrassed by question,” “patient does not remember,” or “patient considers question inappropriate.”
Another potential use of null codes is preventing the entry of uncontrolled placeholder or nonsense data into an EHR due to poor system design. An example of this is an EHR that will not allow users to proceed to the next step/screen unless a value is entered into one or more fields. When this occurs, EHR users may get around this obstacle by entering dummy data, which may be misleading and possibly dangerous. In clinical situations, there may be times when entering all the data is not possible for valid reasons. In such cases, providing null codes allows less frustration for users while providing EHR designers with information that could help in improving the system’s design.
Databases contain a variety of data types, so null codes should be adjusted to match the underlying data type. In the UAB EHR, text-based codes were the predominant type used. However, codes were also adapted to work in numerical and date fields. I don’t remember the specific codes used at UAB or for CNICS, but they were created using the following rules. For numeric codes, negative values were used for encoding text-based null codes. Dates were encoded using month/day/year groupings that were unlikely to occur in an EHR. Here are a few examples of how numeric and date null codes can be created:
Null codes are a simple way of gaining valuable information about workarounds/workflow glitches, patient attitudes, software design flaws, and aspects of any clinical process during which data collection occurs. Having dealt first-hand with the issues of missing data values, I have come to appreciate having a coding scheme that provides analyzable data when trying to figure out why values are missing As it turns out, using null codes is one situation in which nothing can actually mean something.