Last spring, I sat in a vendor demo for a clinical decision support tool. The sales engineer clicked through a patient case in about forty seconds. Clean interface, smooth transitions, the kind of thing that looks great on a projector. Then I asked him to enter a patient on three blood thinners with a penicillin allergy and an eGFR of 22. He froze. Clicked through four nested menus. Triggered two duplicate alerts he could not dismiss. The whole interaction took nearly six minutes. Nobody in that room would have bought the product after watching its worst day.
The Numbers Are Worse Than You Think
In 2020, Melnick and colleagues published a study in Mayo Clinic Proceedings that should have been a wake-up call for every health IT vendor in the country. They measured EHR usability using the System Usability Scale, a standardized instrument applied across thousands of software products globally. The result: a mean SUS score of 45.9, which places EHRs in the bottom 9th percentile of all software ever tested. That is an F. Not a gentleman's C, not a "needs improvement." An F. For context, the vendors' own self-reported usability scores averaged 75. The gap between what vendors believe about their products and what clinicians experience is a 30-point chasm.
I think about that gap constantly when I evaluate clinical software. It explains why Sinsky et al. found in their 2016 Annals of Internal Medicine study that physicians spend two hours on EHR documentation for every one hour of direct patient care, plus an additional one to two hours of after-hours charting. Clinicians have a term for that extra work: pajama time. It is not a joke. It is a systems failure measured in burned-out physicians and missed family dinners.
When Bad UI Kills
There is a version of this conversation that stays comfortable. We talk about frustration, inefficiency, physician satisfaction scores. But the real stakes are harder to sit with. Between 2009 and 2012, the Joint Commission's Sentinel Event database recorded 98 alarm-related adverse events across U.S. hospitals. Eighty of those resulted in death. Eighty people died, in part, because the systems meant to protect them generated so much noise that clinicians stopped listening.
This is the context that gets lost in vendor pitches. Drug-drug interaction alert systems routinely generate around 180 alerts per day, and clinicians override 93 to 96 percent of them, according to Van der Sijs's analysis compiled by AHRQ's Patient Safety Network. When you design a safety system that fires so often it trains users to ignore it, you have not built a safety system. You have built a liability with a checkbox.
I used to evaluate clinical software primarily by its demo. Clean UI, good feature set, reasonable pricing. Now I evaluate it by its worst day. What happens when a nurse is managing six patients and the system throws its fifteenth alert in an hour? What happens when a surgeon needs a critical lab value and the interface buries it three clicks deep? A 2025 scoping review published through Wiley found that nurse practitioners averaged roughly 13 clicks above the minimum required per task in their EHR workflows. Thirteen unnecessary clicks, repeated hundreds of times a day, across thousands of clinicians. That is not a minor annoyance. That is a design failure with compounding costs.
What Dog-Fooding Actually Means
The term "dog-fooding" comes from the software industry, the idea that you should use your own product daily. In medical technology, it means something more specific and more demanding. It means your engineers should try to use the product at 3 AM with five active patients, degraded wifi, and gloves on. It means your product team should sit in a real clinical environment and watch a hospitalist try to reconcile medications during a shift change. Not a usability lab. Not a think-aloud protocol with a single volunteer. The actual clinical environment, with all its interruptions and urgency and imperfect lighting.
The FDA recognized this gap in their February 2016 guidance document, "Applying Human Factors and Usability Engineering to Medical Devices," which explicitly calls for testing under conditions that simulate real use environments, including stress, fatigue, and divided attention. Most vendors I have evaluated treat that guidance as a regulatory checkbox rather than a design philosophy. The ones who take it seriously build fundamentally different products.
Ancker and colleagues demonstrated this in a 2019 study presented at AMIA. When clinical decision support tools were designed using genuine user-centered methods, with clinicians embedded in the design process and iterative testing under realistic conditions, the adoption rate was five times greater than previous reports. Five times. That is not a marginal improvement from a better color palette. That is the difference between a tool clinicians actually use and one they work around.
My Red Flags When Evaluating Clinical Software
After years of reviewing these products, I have developed a short list of what makes me walk away from a vendor. I share it because I think the evaluation framework matters as much as any individual product review.
First, if the vendor cannot demo the product with a complex patient scenario on the first ask, that tells me they have not stress-tested their own interface. Second, if alert logic is not configurable by the clinical site, the system will generate noise that clinicians will learn to ignore, and we already know where that leads. Third, if the vendor quotes usability scores from internal testing but cannot produce independent validation, I assume a 30-point gap between their number and reality. Fourth, if the product requires more than two clicks to reach any critical clinical data point, it was designed by someone who has never been interrupted mid-task by a code blue.
I got this wrong once. In 2022, I was enthusiastic about a documentation tool that performed beautifully in a structured pilot. Clean interface, fast load times, positive feedback from the pilot cohort. What I missed was that the pilot ran on a dedicated network with half the normal patient census. When the tool deployed to a full unit, latency tripled and the autocomplete feature started suggesting entries from the wrong patient context. The product was not bad. The evaluation conditions were too generous. I learned that a product's worst day matters more than its best demo.
The AI Question
I want to push back on something I hear constantly: that AI will fix healthcare's usability problems. A 2024 study in npj Digital Medicine found that fewer than 2 percent of clinical AI models advance beyond prototyping into actual clinical use. Two percent. The bottleneck is not algorithmic sophistication. It is the same problem we have had for decades: building technology that works in the controlled environment of a research lab and then discovering it fails in the messy reality of clinical practice.
There are bright spots. A multi-center study covered by STAT News in early 2026 tracked ambient AI scribes across 1,800 clinicians at five academic medical centers and found an average time savings of 16 minutes per day. That is meaningful, and it is one of the few large-scale AI deployments I have seen with rigorous measurement across real clinical environments. But notice what made it work: the technology was tested at scale, in actual practice, across multiple sites with different workflows. That is dog-fooding in its truest form. The vendors who skip that step, who pitch AI solutions validated only in single-site pilots with hand-selected users, are repeating the same mistakes that gave us EHR systems in the bottom 9th percentile of global software usability.
The vendor from that demo last spring eventually came back with an updated version. They had hired two emergency physicians as part-time consultants and run the product through a 90-day deployment in a community ED. The alert system was redesigned to surface only high-severity interactions, and the override workflow dropped from four clicks to one. It was a genuinely better product. Not because they had better engineers the second time, but because they finally used the thing under conditions that mattered.
That is what dog-fooding is. Not a slogan. Not a startup culture affectation. A commitment to experiencing your product's worst day before your users do, because in clinical software, your users' worst day might be someone's last.
Dr. Sina Bari is a physician-executive who writes about healthcare technology, clinical AI, and the systems that shape modern medical practice.
Frequently Asked Questions
Why do EHR systems score so poorly on usability compared to other software?
EHR systems must handle extraordinary complexity, including medication reconciliation, clinical documentation, order entry, and regulatory compliance, all within a single interface. But the core issue is not complexity itself. It is that most EHR interfaces were designed around billing and compliance workflows rather than clinical decision-making. The Melnick et al. 2020 study found a SUS score of 45.9 (bottom 9th percentile globally), while vendors self-reported scores averaging 75, suggesting a fundamental disconnect between how these products are evaluated internally and how they perform in practice.
What does "dog-fooding" mean in the context of medical technology?
Dog-fooding means using your own product under realistic conditions. In medical technology, that means testing with complex patients, high cognitive load, frequent interruptions, and degraded infrastructure, not clean demo environments. The FDA's 2016 human factors guidance explicitly calls for testing under simulated real-use conditions including stress and fatigue. Products developed with genuine user-centered design methods have shown adoption rates five times higher than those developed without clinician involvement (Ancker et al., AMIA 2019).
Can AI solve healthcare's usability problems?
Not automatically. Fewer than 2% of clinical AI models progress beyond prototyping into actual use (npj Digital Medicine, 2024). AI tools that succeed, like ambient scribes showing 16 minutes/day savings across 1,800 clinicians in a 2026 multi-center study, tend to share one trait: they were validated at scale in real clinical environments rather than controlled pilots. AI is a tool, not a fix. It inherits whatever usability problems exist in the systems it integrates with.
How can hospitals evaluate whether a clinical software product will work in practice?
Request demos with complex, realistic patient scenarios rather than curated cases. Ask for independent usability validation, not vendor-reported scores. Check whether alert systems are configurable by the clinical site and whether critical data is accessible within two clicks. Most importantly, ask where and how the product has been deployed at scale. Single-site pilots with hand-selected users rarely predict real-world performance. Products stress-tested across multiple sites with diverse workflows are far more likely to succeed in yours.