An artificial intelligence agent called MIRA outperformed emergency physicians on diagnostic accuracy across eight clinical presentations in a study of 574 retrospective cases, according to a paper published in Nature — but the system has never treated a real patient, operates without FDA clearance, and was evaluated in a controlled sandboxed environment that differs substantially from the conditions of a functioning emergency department.
What the Study Did
The study evaluated MIRA — a multi-step AI agent designed to reason sequentially through clinical information — using a curated dataset of retrospective emergency department cases. The eight diagnoses included high-acuity conditions such as pulmonary embolism, acute myocardial infarction, sepsis, and stroke. Each case presented the AI with the same structured data available to physicians: history, physical examination findings, laboratory results, and imaging reports.
MIRA’s diagnostic accuracy exceeded that of the physician comparator group on the aggregate dataset and on six of the eight individual diagnoses. The system showed particular strength in synthesizing laboratory and imaging data simultaneously.
Retrospective case review and real-time emergency medicine are different tasks. In the study, the correct diagnosis was already known and cases were selected to include specific conditions. In an actual emergency department, the clinician faces undifferentiated presentations where the diagnosis is unknown in advance and most patients do not have the eight diagnoses evaluated here. Removing that uncertainty changes what the AI is actually being asked to do.
What the Study Does Not Establish
The paper does not address clinical workflow integration, time pressure, handling of ambiguous presentations, error consequences in life-threatening situations, or liability. MIRA has not received FDA 510(k) clearance or De Novo authorization and has not demonstrated performance in an actual emergency department.
Independent experts commenting through the Science Media Centre noted that validation on prospective, unselected patient populations — where AI systems routinely perform worse than on curated test sets — would be required before clinical conclusions could be drawn.
MIRA autonomous clinical AI. Nature. 2026; doi:10.1038/s41586-026-10675-5. Science Media Centre expert commentary, June 2026.
Correction (June 21, 2026): An earlier version of this article stated MIRA was evaluated on “311 retrospective emergency cases.” The Nature paper evaluated MIRA across 574 emergency department cases from the MIMIC-IV dataset; 311 is the per-arm count in a triple-evaluation audit subset, not the total number of cases in the evaluation. The dek, front-matter claim, and body text have been corrected.