The Current Limitations of Ambient AI Medical Scribes

Sina Bari MD- Medical

By Sina Bari, MD

AVP Healthcare and Life Sciences AI, iMerit Technology

As a physician and medical AI developer, I’ve had the unique opportunity to witness the promise and pitfalls of ambient AI scribe technology from multiple perspectives. While these tools offer tremendous potential to reduce administrative burden and improve clinical workflow, my experience testing and developing them across various healthcare settings suggests that they would benefit from attention as development continues.

The Current State of Ambient AI Scribes

Ambient AI scribes represent one of healthcare’s most promising applications of natural language processing. These systems passively listen to physician-patient conversations, automatically generating clinical documentation without requiring direct input from clinicians. The value proposition is compelling: physicians spend less time on documentation, potentially reducing burnout while increasing patient engagement.

However, as someone who has led data teams for multiple healthcare AI clients at iMerit, I’ve observed firsthand that the reality sometimes falls short of our ideal. Let me share some specific areas where current ambient scribe technology struggles.

Specialty-Specific Limitations

Medical specialties with unique vocabularies and physical examination components pose particular challenges. As a plastic surgeon, I’ve found ambient scribe technology especially problematic in my field. Consider this actual transcription from a recent consultation:

Sina Bari, MD: “I’m noticing some ptosis of the upper pole with bottoming out inferiorly. The IMF appears to have descended about 2cm from its original position. We could address this with an internal mastopexy technique using acellular dermal matrix as a hammock.”

Patient: “Will that help with the rippling I’m seeing on the inner part?”

Sina Bari, MD: “Yes, the ADM would provide additional soft tissue coverage there. I’d also recommend fat grafting to the décolletage area for a more natural transition.”

AI Scribe Output: The patient has drooping of the breast. Recommended internal procedure with dermal matrix. The patient is concerned about rippling. Recommended fat transfer to the chest area.

This simplified output misses critical anatomical specifics, surgical planning details, and the precise nature of the patient’s concerns. The specialized vocabulary of plastic surgery—terms like “ptosis,” “IMF” (inframammary fold), “internal mastopexy,” and “décolletage”—is either misinterpreted or oversimplified to the point of clinical uselessness.

Similar issues occur in other specialties with dense, technical vocabularies:

  • Ophthalmology (retinal layers, complex visual field testing)
  • Orthopedics (specific joint mechanics, surgical approaches)
  • Dermatology (subtle morphological descriptions of lesions)
  • ICU Nursing (broad list of specialty assessments)

Multilingual Conversation Challenges

Another significant limitation involves multilingual clinical encounters. In diverse areas like California’s North Bay, conversations often flow between languages, presenting substantial challenges for current AI systems.

Here’s an example from a recent post-operative follow-up:

  • Sina Bari, MD: “How has your recovery been since the procedure last week?”
  • Patient: “Mostly good, but tengo un poco de dolor aquí.” [pointing to incision site]
  • Sina Bari, MD: “¿Es un dolor constante o solo cuando lo tocas?”
  • Patient: “Solo cuando me muevo de cierta manera. Otherwise it’s fine.”
  • Sina Bari, MD: “That’s actually normal during this healing phase. Te voy a recetar algo más suave para el dolor.”

AI Scribe Output: Patient reports mostly good recovery with some discomfort. Dr. Bari states this is normal during healing and will prescribe medication.

The AI completely missed the specific location of pain, the important distinction that pain occurs only with certain movements, and the nature of the medication change (switching to something milder). Code-switching between English and Spanish—common in real clinical conversations—creates significant gaps in the documentation that could impact care quality.

Failing to Integrate Historical Context

Perhaps the most serious limitation is Ambient AI’s struggle with longitudinal patient narratives. Medical care is inherently contextual, building upon previous visits, test results, and treatment responses. Current systems operate primarily within the siloed context of a single encounter.

Consider this follow-up discussion with a post-mastectomy reconstruction patient:

  • Sina Bari, MD: “The expander is at 350cc now. How do you feel about the projection compared to your natural breast?”
  • Patient: “It still feels a bit smaller than we discussed last time.”
  • Sina Bari, MD: “You’re right. Looking at your chart from last month, we were aiming for more to match your left side. Let’s add another 50cc today.”
  • Patient: “And what about that skin-tightness issue I mentioned in my message last week?”
  • Sina Bari, MD: “Yes, I saw that. The tightness along the lateral flap is expected at this stage, but we can slow down the expansion schedule if it’s uncomfortable. The redness you mentioned has improved, though, which suggests we’ve resolved that early infection concern.”
  • AI Scribe Output: Tissue expander currently at 350cc. The patient reports feeling smaller than desired. Plan to add 50cc today. The patient mentions skin tightness. Dr. Bari states lateral tightness is expected at this stage. The redness has improved, resolving infection concerns.

The AI captured the surface-level exchange but missed critical contextual elements: the previously established volume goal (450cc), the patient’s digital message about tightness, and the comparative assessment showing infection improvement. Without the integration of these historical elements, the documentation lacks the continuity essential for quality care.

Recommendations for Improvement

As someone deeply committed to both clinical excellence and technological advancement, I believe addressing these limitations requires a multi-faceted approach:

  1. Specialty-Specific Training Data: We need annotation frameworks leveraging board-certified specialists from diverse fields. At iMerit, I’ve advocated for “clinician-led annotation” to capture diagnostic nuances that general medical transcriptionists and scribes miss.
  2. Improved Contextual Understanding: Future systems must function more like human scribes who review relevant history before encounters and integrate that knowledge into their documentation.
  3. Advanced Multilingual Models: Healthcare-specific language models that handle code-switching and maintain clinical accuracy across languages are essential for equitable care.
  4. Human-in-the-Loop Verification: As with most medical AI applications, I continue to believe hybrid approaches combining AI efficiency with human oversight yield the best results.

Ambient AI scribes hold enormous promise for reducing physician burnout and improving patient care. However, as both a practitioner and AI leader, I believe acknowledging their current limitations is crucial for responsible implementation.

The examples I’ve shared illustrate that we’re still in the early stages of this technology. While I remain optimistic about the future of ambient documentation, healthcare leaders and developers should understand that current technology is Version 1.0, and significant work remains to get us to full functionality.

As with surgical innovation, progress in medical AI requires both technical excellence and unflinching honesty about where improvements are needed. Only through this balanced approach can we ensure these technologies truly enhance rather than compromise the quality of patient care.