Studies of what goes wrong behind the scenes in clinical software are somewhat rare. More commonly, reports address issues as they affect end users, not how those issues arise from programming errors or architectural missteps. Analysis of Clinical Decision Support System Malfunctions: A Case Series and Survey, by Wright and colleagues, provides information about how software design choices and development practices play out in the real world (1). I am heartened not so much about the specifics of what these authors disclose, but more by the hope that they are the early vanguard of a developing trend in clinical informatics research.
Decision support is one of the most widely-touted benefits of HIT. CDS is considered a cornerstone of patient safety. A quick search of the literature results in numerous articles extolling the benefits of CDS, lamenting the absence of the desired effect, or complaining about alert fatigue, but few that delve into CDS malfunctions. Why? Because the scariest malfunctions are likely invisible.
Quite by accident, the lead author discovered that a CDS rule was not firing as expected, leading to a search for the cause. In the process, additional malfunctions were noted. Of particular interest is the way in which the investigation was conducted.
The first case was identified through happenstance, and the remaining three were found by carefully examining alert firing data. In each case, our team conducted an extensive investigation of the CDSS malfunction in order to identify key factors that contributed to the issue. These investigations included a review of alert firing logs, alert rule logic, alert system configuration, audit data, interviews with system developers and users, and audits of source code and system design specifications.
To estimate the size of each anomaly, we extracted the alert firing data for each of the four alerts that exhibited an anomaly, and divided them into data from the period when the malfunction occurred and data from before and after the malfunction occurred. We fit a linear model to the nonanomalous data, adjusting for the date (to account for typical increases in alert volume over time) and whether each date was a weekday or weekend day (because, typically, many fewer alerts fire on weekend days). We then used this model to estimate the expected number of alert firings per day during the period when the malfunction occurred, assuming the alert was working correctly, and subtracted the actual number of firings during the period when the malfunction occurred to estimate the number of excess or missed alert firings while the malfunction was happening.
Alerts that fire too frequently will very likely be noticed. However, those that never fire or that under-fire will not. Finding events that should happen, yet do not, is difficult. Access to all required data (e.g., system logs, patient data) as well as source code and the developers made this study feasible for this group of authors, and would likely make replication difficult at sites with commercial systems.
Ultimately, the researchers identified four categories of malfunctions.
We identified four CDSS malfunctions at Brigham and Women’s Hospital: (1) an alert for monitoring thyroid function in patients receiving amiodarone stopped working when an internal identifier for amiodarone was changed in another system; (2) an alert for lead screening for children stopped working when the rule was inadvertently edited; (3) a software upgrade of the electronic health record software caused numerous spurious alerts to fire; and (4) a malfunction in an external drug classification system caused an alert to inappropriately suggest antiplatelet drugs, such as aspirin, for patients already taking one.
Subsequent to their local analysis, the authors conducted a survey to determine the frequency of CDS malfunctions. Ninety-three percent of CMIOs reported CDS malfunctions of some type. The really worrisome, but not unexpected, finding is that most malfunctions were either reported by users (83%) or were personally noted by the CMIO (48%). As the authors note, failed firing malfunctions are not easy to discover or address.
In terms of discovering malfunctions, we found that users were most likely to report issues that manifested as incorrect alerts, particularly when alert volumes spiked dramatically. However, users were less likely to report cases of missing CDS alerts, bringing to mind, again, Sherlock Holmes and the dog in the night. Therefore, error detection systems that rely entirely on user reports to identify CDSS malfunctions are unlikely to be robust. Yet, perhaps not surprisingly, “user report” was the most common mode of malfunction identification reported by the respondents to our survey of CMIOs. Although user reports are a critical source of malfunction identification, CDSS implementers must also construct monitoring and testing strategies and tools that proactively identify and prevent CDSS malfunctions.
The authors take a case-based approach to solutions. For failed firing situations (Cases 1 and 2), they offer a range of possible strategies.
Suggested solutions for Case 1 malfunctions:
- Reliable communication strategies should be employed to ensure that changes in clinical terminologies are communicated to all CDSS teams.
- Tools to support terminology management should have the capability to detect and mitigate the downstream impact of terminology changes. As terms and codes are changed, it should be possible to determine the effects of those changes on order sets, CDS rules, documentation tools, etc.
- Proactive monitoring tools and strategies should be employed to enable quick detection of malfunctions in the production systems.
- Enhanced software quality assurance testing methods, including unit and integration testing, supported by test scripts, tools, and automated tests, should be employed to ensure that CDSSs function correctly. These tests are particularly important at the time of software upgrades and CDSS content changes.
Suggested solutions for Case 2 malfunctions:
- CDSSs should be tested by a different analyst than the one that built the content.
- CDS rules should be tested in the live environment after any CDS-related change and after major EHR software upgrades. This testing should be done for both new rules and existing rules (regression testing).
Some suggestions are standard best practices (e.g., regression testing, software QA testing). Others are good, but depend on organizational dynamics (Reliable communication strategies should be employed to ensure that changes in clinical terminologies are communicated to all CDSS teams). For the most part, these suggestions are targeted at the specific cause of the local case (Case 1-identifier change, Case 2-an inadvertently edited rule). However, what if there are other potential causes for failed firings? In terms of software design/development/deployment practices, how can one detect and reliably prevent failed firings? Serendipity is NOT an effective software quality tool. What is needed is a way of monitoring CDS systems for failed firings in general, which is a hard problem. Once an issue is detected, dealing with it should be straightforward. So what does this information tell us about improving CDS design and development?
Based on the investigative approach taken by the authors, one possible tool for detecting failed firings could be a predictive model of expected firings. Patient populations are fairly standard within specific clinical areas. For example, internal medicine practices have similar populations wherever they are located. Tertiary care centers are more similar in population regardless of location than they are to nearby community hospitals. With this in mind, it seems that one approach to looking for failed firings would involve keeping a profile of the expected patient population for a given setting type and determining the number of events expected per some unit of time for a given rule or situation. CDSS would have access to this data, which could be used to test CDSS functions. Such an approach is only feasible if there is a standard way of sharing predictive models across CDS systems and organizations – individual organizations should not have to create their own models from scratch. Currently, this is not doable for the same reasons that semantic interoperability does not exist. Reliable CDS requires more than rules and an alerting system; there must be some meta-knowledge of what should and should not happen as well.
Another matter worth considering is that of CDSS architecture and design practices. Decision support is a process-based activity. Thus, any CDS system requires a means of tracking processes and applying rules. Perhaps it is time to consider using workflow technology to manage decision support. Building a CDS engine within an EHR necessitates building a complex system within another complex system, resulting in a final system that is more difficult to understand and manage than either alone. All of the software QA testing suggested by the authors would be easier if the EHR and the CDSS were completely separate systems that communicated via a standard protocol. That way both systems could be tested and optimized independently. EHR design issues would be separated from CDSS design issues. The use of workflow technology for CDS is not a novel idea, but simply one that has not been widely adopted (2). In fact, workflow patterns have been shown to function well for automated guidelines (3).
If we are going to allow computers to help clinicians with decision-making, every possible precaution must be taken to ensure those systems are reliable. Wright et al. have shown they are not. Clinicians are drowning in a sea of inappropriate alerts while important ones may never be given. New approaches to CDSS design and testing are needed to enhance reliability. Perhaps now is a good time to consider workflow technology and predictive modeling as essential features of provably reliable clinical decision support systems.
- Wright A, Hickman TT, McEvoy D, Aaron S, Ai A, Andersen JM, Hussain S, Ramoni R, Fiskio J, Sittig DF, Bates DW. Analysis of clinical decision support system malfunctions: a case series and survey. J Am Med Inform Assoc. 2016 Mar 28. [E]
- Huser V, Rasmussen LV, Oberg R, Starren JB. Implementation of workflow engine technology to deliver basic clinical decision support functionality. BMC Med Res Methodol. 2011 Apr 10;11:43.
- Kaiser K, Marcos M. Leveraging workflow control patterns in the domain of clinical practice guidelines. BMC Med Inform Decis Mak. 2016 Feb 10;16(1):20.