Last month I had four Claude windows open at the same time. One was drafting a quality metrics summary, another was comparing two vendor contracts against our clinical requirements, a third was pulling findings from a stack of pathology AI validation studies, and the fourth was helping me revise an IRB protocol response. Around hour three, I noticed a dull pressure behind my eyes and realized I had not carefully read any of the outputs in over forty minutes. I was approving things. That is not supervision. That is rubber-stamping at machine speed.
The main risk of highly parallelized AI work is cognitive overload, not automation failure. Clinicians can push more tasks through the same brain, but that increase in bandwidth raises the chance of missed context, premature trust, and decision fatigue, especially when the work is time-sensitive or safety-critical.
In hospital operations, radiology, documentation, quality review, and patient messaging, the issue is not whether AI can help. The issue is whether the clinician has enough attentional capacity left to verify what the machine produced and make the final judgment safely.
That experience is the story behind the so-called Claude headache phenomenon: the strain that shows up when AI lets a clinician run at a pace the human brain was never designed to sustain. And the problem is not unique to me. In physician Slack channels and medical Twitter threads, the pattern keeps repeating. "I got more done in three hours than I used to in a week," one hospitalist wrote. "And then I spent the next two hours wondering if any of it was actually right."
For context on my broader clinical and leadership lens, I explain my background as a physician at Dr. Sina Bari, MD, with Stanford training and physician-executive experience, and I keep a separate professional presence at sinabarimd.com.
The real bottleneck is not the machine
Two years ago, I would have told you that cognitive overload from AI was mostly a training problem. Give clinicians better onboarding, better prompt design, better interfaces, and the brain adapts. I was wrong. The brain does not adapt to parallel high-stakes supervision the way it adapts to a new EHR layout. The load is qualitatively different because every AI output demands a judgment call, not just navigation.
The data on physician cognitive load was already alarming before AI entered clinical workflows. Sinsky et al. found in their 2016 Annals of Internal Medicine time-motion study that for every one hour of direct clinical face time, physicians spent nearly two additional hours on EHR and desk work. Shanafelt et al. reported in Mayo Clinic Proceedings (2022) that 62.8% of physicians had at least one symptom of burnout, up from 38.2% in 2020. That was the baseline before anyone asked clinicians to also supervise machine-generated drafts, summaries, and triage decisions in parallel.
The harder problem is function allocation: which parts of a task should be done by the machine and which should remain under direct human control. The Fitts list logic still applies, as de Winter and Dodou discuss in their analysis of why that framework has persisted. Humans are better at judgment, exception handling, and value-based tradeoffs. Machines are better at speed, repetition, and scale. AI blurs that line because it produces outputs that look like finished human work but still require human verification.
In my experience deploying clinical AI tools, the first failure is rarely a dramatic wrong answer. It is the quiet erosion of attention. A colleague in radiology described it well: "I used to read every film fresh. Now I find myself scanning the AI annotation first and only really looking at the image if something is flagged. I am not sure when that switch happened." That kind of anchoring is invisible to the physician experiencing it. It only shows up in the error rate, weeks later, if anyone is measuring.
Why parallelization feels powerful, then starts to hurt
The seduction of AI-assisted parallelization is that it feels like a superpower for the first ninety minutes. I have watched it happen in my own workflow and in physicians I advise. You open a second stream. Then a third. Each output is individually reasonable, so the cognitive cost feels low. But Wickens’ multiple resource theory, which Stephanidis et al. cite in their 2019 IJHCI paper on HCI grand challenges, predicts exactly what happens next: when tasks share the same cognitive channel (visual attention, verbal reasoning, decision-making), they compete rather than coexist. Clinical AI tasks almost all draw from the same pool.
Trust compounds the problem. Robinette et al. demonstrated in Frontiers in Robotics and AI (2021) that an agent’s predictability directly affects both trust calibration and cognitive load. When a model is helpful 90% of the time but occasionally shifts tone, confidence, or level of detail, clinicians spend extra effort recalibrating. That recalibration is invisible overhead. Multiply it across four or five concurrent AI streams and the total burden exceeds what most people can sustain without degraded judgment.
I have felt this myself. On a Thursday afternoon with three AI-assisted tasks running, I approved a contract summary that contained a billing term I would normally have flagged. I caught it the next morning during a cold re-read. Nothing bad happened. But the near-miss was instructive: the error was not in the AI output. It was in my attention. I had spent my verification budget on the first two tasks and was running on fumes for the third.
Alert fatigue in clinical decision support follows a nearly identical pattern. Nanji et al. found in a 2014 JAMIA study at Brigham and Women’s Hospital that clinicians overrode 96.2% of drug allergy alerts. That number is not recklessness. It is what happens when a system generates so many low-value signals that the brain stops treating any of them as important. Concurrent AI outputs carry the same risk. If every stream demands verification but most outputs are fine, the clinician learns to skim. The one exception that actually matters gets the same two-second glance as the ten that did not.
What this looks like in hospitals
I watched this play out during a health system's pilot of an AI documentation assistant last year. The hospitalists loved it at first. Discharge summaries that took twenty minutes now took five. But within six weeks, the medical director noticed something: after-hours chart corrections had increased by roughly 30%. Physicians were generating notes faster but reviewing them less carefully. The pilot had made the documentation task faster while making the verification task harder, because now the clinician had to catch errors in someone else's prose rather than in their own.
In radiology, the pattern is similar but the stakes are sharper. When an AI flags several studies simultaneously, the reader becomes a reviewer of machine output rather than a diagnostician building an impression from scratch. A radiologist I work with put it bluntly: "The AI is right often enough that I've started trusting it more than I should. I caught myself last week skipping the scout view on a flagged CT because the algorithm had already localized the finding." That kind of shortcutting is rational from a time-management standpoint and dangerous from a patient safety standpoint.
Singh et al. estimated in BMJ Quality and Safety (2013) that diagnostic errors affect approximately 5% of US outpatient encounters, roughly 12 million adults per year. Those errors are predominantly cognitive, not technical. Adding concurrent AI streams to an already strained diagnostic process does not automatically reduce that 5%. It can make it worse if the AI becomes another source of anchoring rather than an independent check.
For hospital executives, the governance question is whether the institution has designed the workflow to respect human attention limits. I would not deploy a concurrent AI documentation system in any clinical setting that lacks a structured override review process. Speed without verification is just risk redistribution. The NIST AI Risk Management Framework is useful here because it forces operational questions about validity, robustness, and monitoring that determine whether a tool is safe when deployed on a Monday morning with three staffing vacancies, a crowded ED, and a clinician already on hour eleven.
Clinical AI should reduce load, not disguise it
Here is what I think most health system executives get wrong about AI deployment: they measure adoption and throughput but not cognitive cost. A tool can increase documentation speed by 60% and still make the clinician's job harder if it requires constant verification, context-switching, and mental bookkeeping. The relevant metric is not “time saved” but “attention consumed,” and almost nobody is tracking that.
Zhao et al.'s 2024 survey of large language models in Frontiers of Computer Science captures how quickly capability has expanded, but capability alone is not safety. A 2025 review in Technologies on generative AI and cognitive challenges in academic research highlights the same tension: as AI removes mechanical effort, it shifts the burden toward higher-order evaluation and judgment, which is the hardest kind of cognitive work to sustain under time pressure.
The best AI deployments I have seen in hospitals share one trait: they reduce the number of active mental tabs a clinician must hold open. One system I evaluated consolidated seven inbox categories into a single prioritized feed with pre-drafted responses. The physicians using it reported less end-of-day fatigue and fewer after-hours logins, not because it did more work, but because it eliminated the task-switching overhead of jumping between queues. That is the right design philosophy. If a tool adds another dashboard, another queue, another verification step, and another alert stream, it is not solving cognitive overload. It is relabeling it.
In practical terms, I care less about whether an AI model can generate a polished answer and more about whether it narrows the cognitive surface area of the task. Does it shorten the path to the right chart? Does it remove duplicate work? Does it surface the one abnormal result that should actually change management? Or does it create a second layer of work that only looks efficient because the machine typed faster than the human could think?
How hospital leaders should respond
Hospital leaders should treat AI fatigue as a patient safety issue, not a personnel weakness. The fix is not to slow innovation to a crawl. The fix is to design for attention. That includes limiting simultaneous AI streams, standardizing how outputs are presented, measuring override rates, monitoring error patterns, and building a governance process that asks whether the tool helps clinicians think more clearly.
A practical framework is simple. First, define the clinical task. Second, define the human decision point. Third, measure whether the AI reduces time, error, or variance without increasing mental load. If it cannot do all three, the tool may still be useful, but it should not be treated as a fully mature clinical asset.
This is especially important as health systems adopt digital health platforms, remote monitoring tools, and operational AI for routing, staffing, and documentation. Those systems can improve care, but they also increase the number of moving parts a clinician has to supervise. The more the system relies on concurrent streams, the more important it becomes to engineer for attention rather than for raw output.
In other words, the future is not just about artificial intelligence. It is about preserving human intelligence under artificial pressure.
Bottom line
I still use Claude with multiple windows open. I still push myself to run concurrent workstreams because the productivity gains are real. But I no longer pretend the headache is just a headache. It is a signal that my verification capacity is depleted, that I am approving things instead of reading them, and that any safety-critical output produced in that state deserves a cold re-read the next morning.
That Thursday with the four open windows taught me something I now tell every hospital executive I advise: the winning system is not the one that uses the most AI. It is the one that uses AI to make clinicians calmer, more accurate, and less overloaded, while keeping the final judgment firmly human. If your physicians are faster but more anxious, you have not reduced their burden. You have just made it harder to see.
FAQ
Can AI overload actually affect clinical decision-making in a hospital?
Yes. When a clinician is juggling multiple AI-assisted workflows at the same time, the most likely problem is not a single obvious error, but degraded attention and slower recognition of exceptions. That can lead to missed context, anchoring on the wrong output, or delayed escalation when a patient’s condition changes.
What happens if a hospital deploys an AI triage tool without clinician oversight?
The tool may appear efficient while quietly shifting risk into the background. Without clinician oversight, false reassurance, alert fatigue, and misrouting become more likely, especially in emergency or inpatient settings where the cost of a missed abnormality is high. The safe model is decision support, not autonomous routing.
How should hospital leaders measure whether AI is helping or just adding cognitive load?
Measure time saved, override rates, error patterns, and user burden together. If the tool speeds up documentation but increases after-hours chart review or forces clinicians to verify too many outputs manually, it is probably adding load rather than reducing it. Operational metrics should include human factors, not just throughput.
What is Dr. Sina Bari’s approach to evaluating clinical AI tools at sinabarimd.com?
I look first at whether the tool improves safety, clarity, and workflow realism in a live clinical environment. A good model should reduce noise, support judgment, and fit the way physicians actually work, not the way a vendor imagines they work. If it creates another layer of work for the clinician, I treat that as a design failure.
Do AI workflow tools help with radiology and pathology, or do they create new risks?
Both can be true. These tools can improve prioritization, triage, and consistency, but they also introduce anchoring risk, overreliance, and fatigue from repeated verification. The safest use is to narrow the search space for the clinician, not to replace the need for expert review.