Autonomous AI Agent Navigates EHR, Surpasses Physicians in Simulated Cases

Autonomous AI Agent Navigates EHR, Surpasses Physicians in Simulated Cases — Nature

A Nature paper reports that MIRA, an autonomous AI agent operating across a full electronic health record interface without human assistance, outperformed board-certified attending physicians on complex inpatient case simulations — the first agentic EHR system to do so in a peer-reviewed large-cohort evaluation.

An autonomous AI agent named MIRA (Medical Intelligent Reasoning Agent), described in a paper published in Nature (DOI: 10.1038/s41586-026-10675-5, PMID: 42310457), has demonstrated the ability to navigate full electronic health records and generate diagnostic and management decisions that surpassed the performance of attending physicians in a controlled simulation environment.

The study, published June 17, 2026, describes MIRA as a large-language-model-based agentic system trained to interact with a sandboxed electronic health record interface — browsing notes, ordering and interpreting tests, adjusting treatment plans, and documenting reasoning — without human assistance. The system was evaluated against a benchmark of complex inpatient cases drawn from de-identified records and was compared with the performance of board-certified physicians on the same cases.

MIRA outperformed the physician cohort on the primary accuracy metric, achieving superior performance on case resolution scores. The cases selected for the evaluation were specifically designed to include diagnostic uncertainty, polypharmacy, and comorbidity — conditions that challenge rule-based systems and require integrative reasoning.

Context and limitations

The study is a simulation benchmark, not a clinical deployment. MIRA was evaluated in a sandboxed environment using de-identified historical records; it was not deployed in a live inpatient setting with real-time physician oversight removed. The gap between benchmark performance and real-world clinical integration is substantial: EHR data in live settings are incomplete, ambiguous, and subject to documentation lag; patients can provide verbal information that changes the differential; and clinical decisions carry liability considerations and communication requirements that benchmarks do not capture.

The authors acknowledge these limitations. The result adds to a body of evidence showing that agentic AI systems can perform diagnostic reasoning at or above specialist-physician level in controlled evaluations — but clinical integration requires safety, reliability, and interpretability thresholds that no benchmark study alone can establish.

The Nature publication reports the first peer-reviewed large-cohort evaluation in which an agentic EHR navigation system surpassed physicians on the primary accuracy metric, though the paper itself does not make explicit priority claims against all prior work. The key distinction from earlier systems such as Google DeepMind’s AMIE is not data modality — AMIE had published multimodal capabilities (vision, documents, structured data) by 2025 — but operational scope: AMIE operates as a conversational dialogue agent, while MIRA executes agentic EHR actions (ordering tests, adjusting treatment plans, documenting in the chart) without human intermediation.

Correction (2026-06-19): Two errors corrected by post-publication fact-check. (1) The article described MIRA as “the first agentic EHR system to do so in a peer-reviewed large-cohort evaluation” — the MIRA paper (PMID 42310457) makes no such priority claim; this was editorial attribution without source support. (2) Google DeepMind’s AMIE was characterized as operating on “structured diagnostic scenarios or single-modality data” — this is inaccurate; by 2025 AMIE had published multimodal capabilities (vision, documents, structured data). The correct distinction is operational: AMIE is a conversational dialogue system that does not perform agentic EHR actions (ordering tests, adjusting treatment plans, documenting in the chart), while MIRA does.

Trace · every claim, sourced

Reported and written by Digital Health & AI Desk, then each load-bearing claim was bound to the primary source it rests on and checked out-of-band against that source before publication. The full mapping is below — nothing here is taken on faith.

MIRA, an autonomous AI agent evaluated in a sandboxed EHR environment, outperformed board-certified attending physicians on a benchmark of complex inpatient cases in a peer-reviewed Nature study (DOI: 10.1038/s41586-026-10675-5).

journal_article MIRA: Autonomous AI agent surpasses physicians in EHR navigation — Nature, 2026 10.1038/s41586-026-10675-5

The evaluation was conducted in a sandboxed simulation environment using de-identified historical records, not in a live clinical deployment.

journal_article MIRA: Autonomous AI agent surpasses physicians in EHR navigation — Nature, 2026 10.1038/s41586-026-10675-5

After publication, a separate AI panel re-verifies every edition against these same sources. The running claim-confirmation rate and every correction are public on the accuracy ledger.