Why Reasoning AI Models May Be the Missing Link in Clinical Decision Support

Sina Bari MD

Most of us working at the intersection of clinical medicine and AI have seen firsthand the incredible progress we’ve made with pattern recognition models. From radiology to pathology, deep learning has proven remarkably good at identifying anomalies, segmenting images, and predicting risk. But as we push AI deeper into the clinical workflow—beyond identifying features and toward making sense of them—we’re hitting a wall.

That wall is reasoning.

Clinicians don’t just see a nodule and call it cancer. We weigh probabilities. We consider the patient’s history, their labs, their symptoms, the meds they’re on, the things they’re not saying. We reason. And if we want clinical AI to play a meaningful role in diagnosis or treatment planning—not just in back-office automation—we need models that can do more than detect. We need models that can think.

What Do We Mean by “Reasoning” in AI?

The term gets thrown around a lot, so let’s define it in context. Reasoning AI models attempt to simulate logical thinking—drawing inferences, weighing alternatives, understanding cause and effect. In healthcare, that might look like:

  • Causal inference models that help answer, “Which treatment will work better for this specific patient?”
  • Neuro-symbolic systems that combine deep learning with structured clinical logic to produce interpretable decisions.
  • LLMs with reasoning abilities that can walk through differential diagnoses or summarize the clinical rationale behind a care plan.

The goal isn’t to recreate human cognition perfectly. It’s to give AI enough structure and common sense that it can support clinical decisions in a way that’s both accurate and explainable.

Why Traditional Models Fall Short

Most of today’s medical AI tools are glorified classifiers. They take an input—an image, a vitals stream, a few paragraphs of clinical notes—and map it to a label. That’s fine for discrete tasks like flagging pneumonia on a chest X-ray or identifying diabetic retinopathy.

But in real-world practice, patients don’t come with labels. They come with ambiguous symptoms, incomplete histories, and multiple overlapping problems. That’s where the current generation of black-box models struggle. They can tell you what they’ve seen before, but they can’t reason about why something is happening or what might happen next if you intervene.

This gap is part of why clinicians often don’t trust AI recommendations, even when they’re accurate. It’s also why the tools often don’t improve care as much as we’d hoped—because they don’t integrate into the way clinicians actually think.

The Case for Neuro-Symbolic and Causal Models

I’m particularly excited about neuro-symbolic AI—hybrid models that combine the statistical horsepower of neural networks with the rule-based structure of symbolic logic. Imagine a model that can interpret a CT scan and then reason, “This lesion has irregular borders and is >30mm, which meets guideline criteria for malignancy.” That’s the kind of explainable output physicians can work with.

Causal models also hold huge promise, especially in personalized medicine. Instead of just saying, “Patients like this had better outcomes on Drug A,” a causal model can simulate: “If this patient took Drug A instead of Drug B, we estimate a 12% better 5-year survival, based on these confounding factors.” That’s the kind of reasoning we do mentally when we tailor care plans. Encoding that logic into AI can make its recommendations more trustworthy and actionable.

At iMerit, we’re already seeing how these approaches improve clinical data labeling and downstream model performance. For example, when annotators use structured knowledge to validate diagnostic findings—or when we integrate causal inference into outcomes analysis for treatment studies—we consistently see better alignment with clinician judgment.

Large Language Models: Reasoning or Just Mimicry?

I’d be remiss if I didn’t mention the elephant in the room: large language models. I’ve been both impressed and uneasy with the performance of LLMs in clinical reasoning tasks. Tools like GPT-4 can absolutely walk through a clinical vignette, generate differentials, and even explain their rationale in plain English. But they’re also prone to hallucinations and overconfident nonsense.

In a recent trial, GPT-4 outperformed physicians on diagnostic accuracy—but doctors with access to GPT-4 didn’t get any better. That’s not just an indictment of the model. It’s a signal that we haven’t figured out how to make AI complement human reasoning yet. In many cases, the doctor either ignored the model or was confused by its logic.

That tells me we need to spend less time chasing benchmark scores and more time designing interactive, explainable reasoning systems that clinicians can actually use. Systems that don’t just spit out an answer but show their work—cite guidelines, highlight patient-specific factors, reveal their uncertainty.

The Regulatory Angle

As these models become more sophisticated, they’ll also draw more scrutiny from regulators. And rightly so. A model that reasons about treatment is a lot more consequential than one that just counts pixels. If it influences clinical decision-making, it likely falls under FDA oversight. That means validation, lifecycle monitoring, and explainability aren’t just nice to have—they’re requirements.

In fact, reasoning models might make the regulatory process easier in some ways. Symbolic or hybrid models can often provide traceable logic paths that satisfy the FDA’s desire for transparency. It’s the black-box “trust me” models that tend to hit snags. That’s another reason I think reasoning AI will eventually become the standard in clinical decision support.

Final Thoughts

If we want AI to go from “smart assistant” to true clinical partner, we have to teach it to reason. That means embracing complexity—not just in our models, but in our evaluation of them. It means designing systems that reflect how clinicians think, not just how machines optimize.

Reasoning AI won’t replace clinical judgment. But it can help reduce uncertainty, highlight blind spots, and extend our cognitive bandwidth. And in a system as overloaded and fragmented as modern healthcare, that’s not just innovation—it’s necessity.


Sina Bari MD is the VP of Healthcare and Life Sciences AI at iMerit, where he oversees the development of clinician-in-the-loop systems for data labeling, model validation, and AI product deployment. A Stanford-trained plastic surgeon, he writes about the convergence of clinical medicine and artificial intelligence.