Harvard Study Says AI Beat Doctors on ER Diagnoses — With a Big Catch

May 4, 2026

100

A new Harvard-led study has found that an advanced AI model outperformed two doctors on diagnosis accuracy in a set of real emergency room cases, adding fresh fuel to the debate over how far AI could go in clinical decision-making. But before anyone starts picturing robots replacing hospital staff, the researchers themselves say that is not what the study proves.

TechCrunch reports that the study, published in Science, came from a team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. In one experiment, researchers looked at 76 real emergency room patients and compared diagnoses from two internal medicine attending physicians with diagnoses generated by OpenAI’s o1 and 4o models. Those diagnoses were then judged by two other attending physicians who did not know which answers came from humans and which came from AI.

Where the AI Performed Better

According to the study summary cited by TechCrunch, o1 performed “nominally better than or on par with” the two attending physicians and GPT-4o at each diagnostic touchpoint, with the biggest difference showing up at the initial ER triage stage, when there is less information available and more urgency. TechCrunch says the model produced the exact or very close diagnosis in 67% of triage cases, compared with 55% for one physician and 50% for the other.

That is the statistic that will grab attention, and understandably so. A result like that suggests AI reasoning models are getting much better at handling messy medical information, not just textbook-style questions. TechCrunch also reports that the Harvard team said the AI was given the same information that existed in the electronic medical record at the time, without extra preprocessing.

The Catch Behind the Headline

The big catch is that even the researchers are not claiming AI is ready to make live, life-or-death decisions in emergency rooms. TechCrunch says the paper instead argues there is an “urgent need” for real-world prospective trials to evaluate these systems in actual patient care. It also says the study only tested performance on text-based information, not on non-text inputs like physical presentation, tone of voice, facial cues, or visual assessment.

That limitation matters a lot. Emergency care is not just about naming the final diagnosis. It is also about spotting what might kill the patient first, deciding what cannot be missed, and working under uncertainty with incomplete information. The Guardian, cited by TechCrunch, also quoted study co-author Adam Rodman saying there is currently no formal framework for accountability around AI diagnoses, and that patients still want humans guiding them through serious treatment decisions.

Why Some Doctors Are Pushing Back

TechCrunch also added an important correction and caveat: the two physicians in the emergency room comparison were internal medicine attending physicians, not ER physicians. That distinction matters because some critics argue it makes the headline result sound broader than it really is.

The article cites emergency physician Kristen Panthagani, who said the study has led to “overhyped headlines” and argued that if AI is going to be compared against doctors’ clinical ability, it should be tested against physicians who actually practice that specialty. She also said that as an ER doctor, her main goal when first seeing a patient is not to guess the final diagnosis, but to determine whether the person has a condition that could kill them.

That doesn’t make the study unimportant. It just means the result should be read carefully. What the study appears to show is that advanced AI models are getting very good at diagnostic reasoning from text records. It does not show that AI can yet replace emergency doctors doing the full real-world job.

Why this matters for Australia
Healthcare systems everywhere are under pressure, and that includes Australia. If AI models really are getting better at handling diagnosis support, triage reasoning, and case review, they could eventually become useful tools for overstretched hospitals and clinicians here too.

But this study is also a reminder that strong headline results don’t automatically translate into safe, accountable real-world care. Medicine is not just pattern recognition on a screen. It involves judgment, communication, ethics, urgency, and responsibility when things go wrong.

The bigger takeaway is simple: AI may be getting better at diagnosing from medical records, but that’s not the same thing as proving it’s ready to take over the emergency room.

Source: TechCrunch | The Guardian | Science coverage

Harvard Study Says AI Beat Doctors on ER Diagnoses — With a Big Catch

Where the AI Performed Better

The Catch Behind the Headline

Why Some Doctors Are Pushing Back

ByteDance Says Its New AI Image Model Understands Design

Meta Launches Muse Image, Its Most Ambitious AI Image Tool Yet

OpenAI Launches GPT-Live to Make Voice Chats Feel Natural

LEAVE A REPLY Cancel reply

Most Popular

App of the Week: ElevenReader Turns Almost Anything Into Audio

Website of the Week: Goblin Tools Makes Overwhelming Tasks Feel Manageable

ByteDance Says Its New AI Image Model Understands Design

Meta Launches Muse Image, Its Most Ambitious AI Image Tool Yet

Recent Comments