The Working Physician's Personal AI Stack

Last Tuesday morning, I was finishing notes between three pre-ops, two post-ops, and a revision consult that needed more listening than typing. My inbox had already produced one patient message that could wait, one that could not, and one vendor email promising to “reimagine clinician productivity,” which is usually where I reach for the delete key. The only AI question that mattered was simple: what survives a real clinic week without adding another layer of cognitive clutter?

The minimum viable personal AI stack for a practicing physician is smaller than most product demos suggest, and it works best when it is boring. For me, the stack is useful only when it saves time in research, drafting, or inbox triage without asking me to outsource judgment, and the hardest skill is knowing when not to use AI at all.

I used to think the ideal setup was a modular system, one tool for literature, one for drafting, one for inbox triage, with integrations between them. Then I watched that version of the stack start to behave like a second job. Now I think the right stack is the one that disappears into the background, the same way a good clinic template does, except with better odds of catching a weak citation or turning a rough draft into something readable.

What the adoption numbers actually mean

The public conversation makes it sound as if physicians are either all-in on AI or hiding from it. The more useful reality is messier. In the AMA’s 2024 physician survey, AI use among doctors rose from 38% in 2023 to 66% in 2024, and the uses were concentrated in documentation, summaries, and administrative support rather than some futuristic bedside autopilot. In the AMA’s augmented intelligence resources, the emphasis is on workflow fit, governance, privacy, and physician oversight, which is exactly where the conversation belongs.

Hospital adoption follows the same pattern. JAMA Network Open reported that 31.5% of 2,174 nonfederal U.S. hospitals were already using generative AI integrated with the EHR in 2024, and 24.7% planned to adopt within a year. That tells me something important as a physician-executive: AI is no longer a novelty problem, it is a deployment problem. The hard part is not whether a tool exists. It is whether it survives contact with Monday morning.

When I evaluate a new tool, I start with a question that most demos skip: does this reduce work for the clinician, or does it merely move work somewhere less visible? A system that creates a cleaner inbox summary but requires manual correction of every second sentence is not a stack. It is a tax.

The minimum viable stack

For me, the stack now has three jobs only.

1. Literature and synthesis

I use AI for first-pass reading when I want speed, not certainty. It helps me map a topic, identify what is new, and surface the papers worth opening myself. I do not use it to tell me what a paper “means” when the conclusion carries clinical consequences. In my experience, the moment a model starts summarizing evidence in a way that feels polished, I get more suspicious, not less. A polished error is still an error.

This is where governance matters. NIST’s AI Risk Management Framework, especially the GOVERN, MAP, MEASURE, and MANAGE functions, is the right mental model for clinical AI because it forces a health system to name the risk owner, the use case, the data flow, and the monitoring plan. The 2024 NIST Generative AI Profile adds specific risks like prompt injection, confabulation, and accountability gaps, which are very relevant when clinicians are using LLMs for literature and draft generation. For a board, that is not abstract policy language. It is the difference between a safe assistant and a silent liability.

2. Drafting and editing

I use AI to turn messy first thoughts into usable prose, especially for internal documents, research summaries, and patient-facing draft language that still needs a physician’s hands on it. The key is that I edit every line. Every line. If I cannot explain why a sentence is there, it does not stay. That sounds tedious, and it is, but the tedium is cheaper than defending a hallucinated claim in a chart or a manuscript.

There is a clinical vulnerability here that I think matters. I have been wrong about how much editing a “good” draft still needs. One afternoon, while preparing a summary for a multidisciplinary discussion, I accepted a fluent paragraph too quickly and nearly carried forward a subtle error in interpretation. It did not become a patient safety event, but it reminded me that fluency can feel like competence. It often is not.

3. Inbox triage

This is the hardest one to trust, and the easiest one to overpromise. A Friday admin block is where inbox AI either earns its keep or gets uninstalled. The best use I have found is narrow: sort messages, flag urgency, draft low-risk responses, and identify the ones that should go straight to a human review queue. I do not let AI answer medication questions, symptom escalation, or anything that depends on my knowing the patient context that the chart will never fully capture.

That boundary is the whole game. The value of the tool is not that it knows more than I do. The value is that it helps me spend my attention where judgment actually matters.

What I would not do

I would not deploy a personal AI stack that automatically drafts clinical advice and sends it without a clinician reading the message end to end. I would not let a model write orders, interpret imaging, or decide whether a post-op complaint sounds routine or dangerous. I would not build a tangled ecosystem of five tools that each claim to save ten minutes and collectively cost an hour of recovery time every evening.

That is where physician taste becomes the rate-limiting skill. Not prompt engineering. Taste. Knowing which tasks are low stakes enough to automate, which need a second pass, and which should remain stubbornly human because the consequences of being wrong are not reversible.

Where regulation enters the stack

The board-level version of this conversation has to include regulation, because a lot of AI in healthcare is drifting toward medical-device territory. If a system is used for diagnosis, triage, or risk prediction, I want to know whether it is being pursued as a 510(k), De Novo, or PMA pathway, and why. The FDA’s distinction matters. A predicate-based claim is a different animal from a novel low-to-moderate-risk device, which is different again from a high-risk system that needs the strongest evidence.

That is also why I care about the difference between a clever general-purpose chatbot and a governed clinical product. The former may be fine for drafting and search. The latter needs validation, monitoring, escalation paths, and a human who is accountable when the output breaks.

If you want the physician-executive lens in shorter form, I keep my own professional stance visible on Dr. Sina Bari’s Stanford-trained physician profile and credentials. The point is not prestige. The point is to be explicit about the perspective behind the judgment.

The week in practice

Across a clinic week, I see the same pattern repeat. Research tasks benefit from breadth. Writing tasks benefit from compression. Inbox tasks benefit from triage. Clinical decisions benefit from restraint. Once I stopped asking AI to be an all-purpose assistant, it became more useful, not less.

I also became more skeptical of the shiny stuff. In a physician’s office, every extra tool has a hidden operating cost: logins, context switching, privacy review, retraining, cleanup, and the low-grade irritation of having to wonder whether a sentence came from a model or from a colleague. Add enough of those costs together and the stack starts acting like clutter with a subscription fee.

That is why the boring stack wins. A limited set of tools, tightly bounded uses, clear human review, and a strong instinct for what should never be delegated. It is not glamorous. It is durable.

Friday is the truth serum

Three weeks ago, during a Friday admin block, one inbox tool earned its keep. It surfaced a genuinely urgent patient message quickly, drafted a clean response for a routine administrative question, and left the rest alone. That was enough. It did not try to be clever. It did not pretend to know the patient better than I did. It simply made the queue smaller.

That is the real test of a working physician’s AI stack. Not whether it dazzles on a slide. Whether it still feels worth keeping after a full week of clinic, when the notes are long, the messages are messy, and the only metric that matters is whether your brain feels lighter or heavier at the end of the day.

FAQ

Which AI tools are working physicians actually using in 2026?

They are mostly using AI for documentation, summaries, inbox support, literature scanning, and draft writing. The AMA’s physician survey found 66% reported AI use in 2024, and those uses clustered around administrative and knowledge work rather than autonomous clinical decision-making. In practice, the tools that survive are the ones that fit into existing workflows and reduce after-hours noise.

What is the minimum viable personal AI stack for a practicing clinician?

A narrow stack usually beats an elaborate one: one tool for research synthesis, one for drafting and editing, and one for inbox triage. Anything beyond that should justify its existence with saved time, lower error risk, or better clinical focus. If a tool adds more logins, more review steps, or more uncertainty, it probably does not belong in the stack.

How do clinicians decide when not to use AI for a task?

I use a simple threshold: if an error could change care, damage trust, or create a legal or privacy problem, I do not delegate the judgment to AI. That includes medication advice, symptom escalation, diagnosis, imaging interpretation, and any message that depends on nuanced patient context. AI can help me prepare, but it should not be the final decision-maker.

What does a physician's AI workflow look like across research, writing, and practice management?

Research work benefits from fast scanning and topic mapping, writing work benefits from first drafts and cleanup, and practice management benefits from triage and sorting. The common rule is human review at every meaningful step. The best workflow is not the one that does everything, it is the one that reduces friction without diluting accountability.

What is Dr. Sina Bari’s approach to AI in clinical work?

Dr. Sina Bari’s approach is conservative, practical, and workflow-driven. AI should save attention, not demand more of it, and it should stay away from tasks where clinician judgment is the safety boundary. That is the same standard I use when I look at a tool for my own clinic week.