What autonomous robotic surgery still gets wrong about embodied AI

Last Tuesday, I was standing beside a robotic case when a junior colleague asked me, half joking and half serious, “So when does the robot start doing the hard parts by itself?” The patient had already been prepped, the camera was docked, and the room had that quiet, compressed focus that only happens when everyone is waiting for the first move. I had an answer ready, but as I watched the system translate motion into instrument control, I realized my old answer was too simple.

Autonomous robotic surgery is arriving later than many people think, because surgical autonomy depends on embodied AI, dexterous manipulation, safety certification, and real-world reliability that the field still cannot measure well enough. The timeline is being set less by model demos and more by whether we can prove a robot behaves safely across the full messiness of tissue, tools, and human oversight.

I used to think surgical robotics would follow the same curve as image-based AI: better datasets, better models, then gradual autonomy. Then I spent more time looking at embodied AI, manipulation, and the gap between laboratory success and clinical trust. Now I think the bottleneck is not intelligence in the abstract. It is physical competence under constraint, measured in a way hospitals, regulators, and surgeons can actually certify.

That distinction matters. A robot that can identify anatomy on a screen is not the same thing as a robot that can hold tension on friable tissue, compensate for suctioned smoke, react to a slipped needle, and stop itself before making a one-millimeter mistake that becomes a bleeding problem. I have seen enough intraoperative variability to be wary of any pitch that treats “autonomous” as a software milestone instead of a systems-safety problem.

Why embodied AI, not just LLMs, determines the surgery timeline

The best public evidence points in the same direction. A 2026 survey in IEEE Transactions on Neural Networks and Learning Systems on vision-language-action models for embodied AI frames the field around perception, language grounding, and action selection, which is exactly the stack surgery needs and exactly the stack still struggling with robust transfer to the real world. Similarly, the 2026 review Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses emphasizes that embodied systems face physical failures, adversarial inputs, and environment drift in ways that text-only systems do not.

That is the central insight I wish more surgical AI discussions would start with. A foundation model can suggest the next step, but a surgical robot has to touch something real. It has to manage friction, force, camera motion, instrument collision, and the fact that tissues do not read the same way every time.

The manipulation literature is moving, but it is still moving in the laboratory. Work such as DexRepNet++ on dexterous manipulation and Intent at a Glance on gaze-guided robotic manipulation shows how researchers are improving hand-object representation and intention inference. Those are important advances. They are also reminders that dexterity remains a research frontier, not a clinical capability.

What hospitals should notice before vendors say “autonomous”

When I evaluate an AI vendor's claims, the first question I ask is boring on purpose: what exactly is the failure mode, and who catches it first? In surgical AI, the answer cannot be “the model will adapt.” It has to be a bounded workflow with a human who can interrupt, a logging layer that can be audited, and a measurement framework that can tell us whether the system is stable across cases, operators, and institutions.

That is why the 2026 paper Toward Maturity-Based Certification of Embodied AI matters for surgeons and hospital leaders. Certification by maturity level is a more realistic idea than binary “approved versus not approved” thinking, because autonomy in the OR will almost certainly arrive in steps, not in one dramatic handoff. I would want to see task-specific reliability, rescue behavior, and clear escalation criteria before I let any system near a patient without active supervision.

Here is the uncomfortable part. Clinical enthusiasm often outruns operational discipline. I have watched otherwise smart people infer capability from a polished demo. That is a mistake. A demo is a narrow success case; the operating room is a distribution shift with consequences.

A 2026 retrospective cohort study in Journal of Robotic Surgery on intraoperative robot telemetry and postoperative outcomes is a useful reminder that telemetry matters, because what a robot records during the case may help correlate system behavior with outcomes afterward. That is the sort of evidence hospitals should be building now, long before anyone markets autonomy as routine.

The surgical analogy that still holds, and the one that does not

I used to think the path to autonomous surgery would look like autopilot. Then I saw how much of surgery depends on tacit judgment, micro-corrections, and the constant renegotiation between plan and anatomy. A better analogy is not flight automation. It is a highly supervised industrial robot operating in a place where the material is alive, variable, and unforgiving.

The strongest near-term use cases are already showing up in narrower domains. In retinal surgery, a 2026 chapter on deep learning-based autonomous robotic retinal surgery reflects how precision, immobility, and task constraints make that field a plausible proving ground. In gastrointestinal endoscopy, GESur_Net for surgical instrument segmentation points to a more modest but clinically relevant milestone, namely, better scene understanding. Those are real steps. They are not a green light for fully autonomous general surgery.

I also think surgeons should pay attention to the broader labor and safety literature around AI. A 2026 experiment, AI prediction leads people to forgo guaranteed rewards, found that people sometimes prefer model predictions over a sure outcome even when the guaranteed option is safer. That matters in the OR because confidence can become a cognitive trap. If an autonomous system looks polished, people may trust it too quickly, and the cost of that bias is higher in surgery than almost anywhere else.

What I would not do

I would not deploy a general-purpose autonomous surgical system on the basis of benchmark videos, institutional excitement, or a promise that “the surgeon remains in the loop.” That phrase can hide a lot of sloppy thinking. If the human has to rescue the system, the workflow needs to prove that rescue is fast, obvious, and effective under stress, not just in a vendor slide deck.

I would also not pretend every procedure should march toward autonomy at the same pace. Some tasks, like segmentation, camera guidance, or repetitive instrument handling, may reach clinical usefulness earlier. Others, like dissection near critical structures or responding to sudden bleeding, belong in the category of hard problems that stay hard for a long time.

This is where my own view changed most. I used to believe the last step to autonomous surgery was primarily technical. Then I started thinking like a hospital executive, not just a surgeon. The last step is regulatory and organizational too. The FDA pathway matters, but so do internal governance, adverse-event reporting, simulation standards, credentialing, and clear accountability. A system can be technically impressive and still be unready for a real operating room.

What the timeline probably looks like

The honest timeline is gradual. Near term, I expect more task-specific automation, smarter assistance, and better intraoperative perception. Mid term, I expect tightly constrained autonomous subsystems in high-control environments, probably with heavy supervisory requirements. Fully autonomous general surgery, especially across multiple procedures and anatomies, is a much longer horizon than the marketing language suggests.

The pace will depend on whether embodied AI becomes certifiable in the way medicine requires. That means quantifiable trustworthiness, reproducible testing, transparent logging, and failure containment. It also means remembering that a robot’s competence is only half the story. The other half is whether humans can verify it without guessing.

If you want the physician-executive version of my position, it is simple. I am optimistic about surgical AI, but my optimism is about better assistance first and autonomy much later. I think the field will make faster progress in bounded tasks than in free-form surgery, and that is the right order. The hospitals that do best will be the ones that treat autonomy as a governance problem as much as a technical one. For more on my perspective and background, see Dr. Sina Bari, Stanford-trained surgeon and physician-leader and my broader work at sinabarimd.com.

FAQ

When will autonomous robotic surgery actually be safe enough for routine use?

Probably not as soon as vendor demos suggest. The limiting factor is not a single model performance number, but whether the system can be measured, certified, and safely interrupted across real operating-room variability. I would expect narrow autonomous tasks first, then tightly bounded procedural steps, before any broad routine use.

What has to be proven before a hospital lets a surgical robot act on its own?

The hospital needs evidence of task-level reliability, clear failure detection, rapid human override, and audit logs that can be reviewed after the case. It also needs governance, including credentialing, simulation, and incident reporting. A polished demonstration is not enough.

How is embodied AI different from the AI used in radiology or documentation?

Embodied AI acts in the physical world, so it must deal with force, friction, motion, and unpredictable contact, not just text or pixels. That makes safety harder because errors can become immediate physical harm. In surgery, the robot must behave correctly under time pressure, not just recognize patterns.

What is Dr. Sina Bari's view on autonomous surgery?

Dr. Bari's view is cautious and practical, with emphasis on bounded autonomy, safety certification, and clinical governance before any broad rollout. He supports task-specific automation, especially where it improves precision or reduces burden, but would not accept full autonomy without strong evidence, logs, and human rescue pathways.

What happens if a hospital deploys a surgical AI system before it is mature enough?

The most likely outcome is not a dramatic failure on day one, but a slow accumulation of hidden risk, overtrust, and workflow workarounds. Those failures are harder to see and harder to recover from. That is why I think maturity-based certification matters so much.