An AI Agent Outperformed Physicians on Simulated ED Cases — in a Sandboxed EHR, Not a Real Hospital

A peer-reviewed Nature study tested MIRA on 574 retrospective emergency cases across eight diagnoses; the system has no regulatory clearance and has never treated a real patient.

An artificial intelligence agent called MIRA outperformed emergency physicians on diagnostic accuracy across eight clinical presentations in a study of 574 retrospective cases, according to a paper published in Nature — but the system has never treated a real patient, operates without FDA clearance, and was evaluated in a controlled sandboxed environment that differs substantially from the conditions of a functioning emergency department.

What the Study Did

The study evaluated MIRA — a multi-step AI agent designed to reason sequentially through clinical information — using a curated dataset of retrospective emergency department cases. The eight diagnoses included high-acuity conditions such as pulmonary embolism, acute myocardial infarction, sepsis, and stroke. Each case presented the AI with the same structured data available to physicians: history, physical examination findings, laboratory results, and imaging reports.

MIRA’s diagnostic accuracy exceeded that of the physician comparator group on the aggregate dataset and on six of the eight individual diagnoses. The system showed particular strength in synthesizing laboratory and imaging data simultaneously.

Retrospective case review and real-time emergency medicine are different tasks. In the study, the correct diagnosis was already known and cases were selected to include specific conditions. In an actual emergency department, the clinician faces undifferentiated presentations where the diagnosis is unknown in advance and most patients do not have the eight diagnoses evaluated here. Removing that uncertainty changes what the AI is actually being asked to do.

What the Study Does Not Establish

The paper does not address clinical workflow integration, time pressure, handling of ambiguous presentations, error consequences in life-threatening situations, or liability. MIRA has not received FDA 510(k) clearance or De Novo authorization and has not demonstrated performance in an actual emergency department.

Independent experts commenting through the Science Media Centre noted that validation on prospective, unselected patient populations — where AI systems routinely perform worse than on curated test sets — would be required before clinical conclusions could be drawn.

MIRA autonomous clinical AI. Nature. 2026; doi:10.1038/s41586-026-10675-5. Science Media Centre expert commentary, June 2026.

Correction (June 21, 2026): An earlier version of this article stated MIRA was evaluated on “311 retrospective emergency cases.” The Nature paper evaluated MIRA across 574 emergency department cases from the MIMIC-IV dataset; 311 is the per-arm count in a triple-evaluation audit subset, not the total number of cases in the evaluation. The dek, front-matter claim, and body text have been corrected.

Trace · every claim, sourced

Reported and written by Owen Tanaka, Digital Health & AI Desk, then each load-bearing claim was bound to the primary source it rests on and checked out-of-band against that source before publication. The full mapping is below — nothing here is taken on faith.

A peer-reviewed Nature study evaluated MIRA on 574 retrospective ED cases across eight high-acuity diagnoses in a sandboxed electronic health record environment.

journal_article MIRA: autonomous clinical AI agent for emergency department diagnostics — Nature 2026 10.1038/s41586-026-10675-5

MIRA has not received FDA clearance or De Novo authorization and has not been tested prospectively with real patients.

expert_commentary Science Media Centre — expert reactions to MIRA Nature paper, June 2026 science-media-centre-mira-june2026

After publication, a separate AI panel re-verifies every edition against these same sources. The running claim-confirmation rate and every correction are public on the accuracy ledger.