An autonomous AI agent named MIRA (Medical Intelligent Reasoning Agent), described in a paper published in Nature (DOI: 10.1038/s41586-026-10675-5, PMID: 42310457), has demonstrated the ability to navigate full electronic health records and generate diagnostic and management decisions that surpassed the performance of attending physicians in a controlled simulation environment.

The study, published June 17, 2026, describes MIRA as a large-language-model-based agentic system trained to interact with a sandboxed electronic health record interface — browsing notes, ordering and interpreting tests, adjusting treatment plans, and documenting reasoning — without human assistance. The system was evaluated against a benchmark of complex inpatient cases drawn from de-identified records and was compared with the performance of board-certified physicians on the same cases.

MIRA outperformed the physician cohort on the primary accuracy metric, achieving superior performance on case resolution scores. The cases selected for the evaluation were specifically designed to include diagnostic uncertainty, polypharmacy, and comorbidity — conditions that challenge rule-based systems and require integrative reasoning.

Context and limitations

The study is a simulation benchmark, not a clinical deployment. MIRA was evaluated in a sandboxed environment using de-identified historical records; it was not deployed in a live inpatient setting with real-time physician oversight removed. The gap between benchmark performance and real-world clinical integration is substantial: EHR data in live settings are incomplete, ambiguous, and subject to documentation lag; patients can provide verbal information that changes the differential; and clinical decisions carry liability considerations and communication requirements that benchmarks do not capture.

The authors acknowledge these limitations. The result adds to a body of evidence showing that agentic AI systems can perform diagnostic reasoning at or above specialist-physician level in controlled evaluations — but clinical integration requires safety, reliability, and interpretability thresholds that no benchmark study alone can establish.

The Nature publication reports the first peer-reviewed large-cohort evaluation in which an agentic EHR navigation system surpassed physicians on the primary accuracy metric, though the paper itself does not make explicit priority claims against all prior work. The key distinction from earlier systems such as Google DeepMind’s AMIE is not data modality — AMIE had published multimodal capabilities (vision, documents, structured data) by 2025 — but operational scope: AMIE operates as a conversational dialogue agent, while MIRA executes agentic EHR actions (ordering tests, adjusting treatment plans, documenting in the chart) without human intermediation.


Correction (2026-06-19): Two errors corrected by post-publication fact-check. (1) The article described MIRA as “the first agentic EHR system to do so in a peer-reviewed large-cohort evaluation” — the MIRA paper (PMID 42310457) makes no such priority claim; this was editorial attribution without source support. (2) Google DeepMind’s AMIE was characterized as operating on “structured diagnostic scenarios or single-modality data” — this is inaccurate; by 2025 AMIE had published multimodal capabilities (vision, documents, structured data). The correct distinction is operational: AMIE is a conversational dialogue system that does not perform agentic EHR actions (ordering tests, adjusting treatment plans, documenting in the chart), while MIRA does.