We often hear about various reports on the inefficacy of machine learning algorithms in healthcare – especially in the clinical arena. For instance, Epic’s sepsis model was in the news for high rates of false alarms at some hospitals and failures to flag sepsis reliably at others.
Physicians intuitively and by experience are trained to make these decisions daily. Just like there are failures in reporting any predictive analytics algorithms, human failure is not uncommon.
As quoted by Atul Gawande in his book Complications, “No matter what measures are taken, doctors will sometimes falter, and it isn’t reasonable to ask that we achieve perfection. What is reasonable is to ask that we never cease to aim for it.”
Predictive analytics algorithms in the electronic health record vary extensively in what they can offer, and a good percentage of them are not useful in clinical decision-making at the point of care.
While several other algorithms are helping physicians to predict and diagnose complex diseases early on in their course to impact treatment outcomes positively, how much can physicians rely on these algorithms to make decisions at the point of care? What algorithms have been successfully deployed and used by end users?
AI models in the EHR
Historical data in EHRs have been a goldmine to build algorithms deployed in administrative, billing, or clinical domains with statistical promises to improve care by X%.
AI algorithms are used to predict the length of stay, hospital wait times, and bed occupancy rates, predict claims, uncover waste and frauds, and monitor and analyze billing cycles to impact revenues positively. These algorithms work like frills in healthcare and do not significantly impact patient outcomes in the event of inaccurate predictions.
In the clinical space, however, failures of predictive analytics models often make headlines for obvious reasons. Any clinical decision you make has a complex mathematical model behind it. These models use historical data in the EHRs, applying programs like logistic regression, random forest, or other techniques
Why do physicians not trust algorithms in CDS systems?
The mistrust in CDS systems stems from the variability of clinical data and the individual responses of individuals to each clinical scenario.
Anyone who has worked through the confusion matrix of logistic regression models and spent time soaking in the sensitivity versus specificity of the models can relate to the fact that clinical decision-making can be even more complex. A near-perfect prediction in healthcare is practically unachievable due to the individuality of each patient and their response to various treatment modalities. The success of any predictive analytics model is based on the following:
- Variables and parameters that are selected for defining a clinical outcome and mathematically applied to reach a conclusion. It is a tough challenge in healthcare to get all the variables correct in the first instance.
- Sensitivity and specificity of the outcomes derived from an AI tool. A recent JAMA paper reported on the performance of the Epic sepsis model. It found it identifies only 7% of patients with sepsis who did not receive timely intervention (based on timely administration of antibiotics), highlighting the low sensitivity of the model in comparison with contemporary clinical practice.
Several proprietary models for the prediction of Sepsis are popular; however, many of them have yet to be assessed in the real world for their accuracy. Common variables for any predictive algorithm model include vitals, lab biomarkers, clinical notes, structured and unstructured, and the treatment plan.
Antibiotic prescription history can be a variable component to make predictions, but each individual’s response to a drug will differ, thus skewing the mathematical calculations to predict.
According to some studies, the current implementation of clinical decision support systems for sepsis predictions is highly diverse, using varied parameters or biomarkers and different algorithms ranging from logistic regression, random forest, Naïve Bayes techniques, and others.
Other widely used algorithms in EHRs predict patients’ risk of developing cardiovascular diseases, cancers, chronic and high-burden diseases, or detect variations in asthma or COPD. Today, physicians can refer to these algorithms for quick clues, but they are not yet the main factors in the decision-making process.
In addition to sepsis, there are roughly 150 algorithms with FDA 510K clearance. Most of these contain a quantitative measure, like a radiological imaging parameter, as one of the variables that may not immediately affect patient outcomes.
AI in diagnostics is a helpful collaborator in diagnosing and spotting anomalies. The technology makes it possible to enlarge, segment, and measure images in ways the human eyes cannot. In these instances, AI technologies measure quantitative parameters rather than qualitative measurements. Images are more of a post facto analysis, and more successful deployments have been used in real-life settings.
In other risk prediction or predictive analytics algorithms, variable parameters like vitals and biomarkers in a patient can change randomly, making it difficult for AI algorithms to come up with optimum results.
Why do AI algorithms go awry?
And what are the algorithms that have been working in healthcare versus not working? Do physicians rely on predictive algorithms within EHRs?
AI is only a supportive tool that physicians may use during clinical evaluation, but the decision-making is always human. Irrespective of the outcome or the decision-making route followed, in case of an error, it will always be the physician who will be held responsible.
Similarly, while every patient is unique, a predictive analytics algorithm will always consider the variables based on the majority of the patient population. It will, thus, ignore minor nuances like a patient’s mental state or the social circumstances that may contribute to the clinical outcomes.
It is still long before AI can become smarter to consider all possible variables that could define a patient’s condition. Currently, both patients and physicians are resistant to AI in healthcare. After all, healthcare is a service rooted in empathy and personal touch that machines can never take up.
In summary, AI algorithms have shown moderate to excellent success in administrative, billing, and clinical imaging reports. In bedside care, AI may still have much work before it becomes popular with physicians and their patients. Till then, patients are happy to trust their physicians as the sole decision maker in their healthcare.
Dr. Joyoti Goswami is a principal consultant at Damo Consulting, a growth strategy and digital transformation advisory firm that works with healthcare enterprises and global technology companies. A physician with varied experience in clinical practice, pharma consulting and healthcare information technology, Goswami has worked with several EHRs, including Allscripts, AthenaHealth, GE Perioperative and Nextgen.