
A Harvard study is changing the AI/medical conversation. But the actual data is more nuanced than the headlines suggest. During the study, researchers found that AI can outperform ER physicians when making diagnoses, and this impact can be significant in high-stakes cases.
But to jump from a controlled study to asking βChatGPT, what is this pain in my chest?β may be a bigβand unwiseβleap. So, hereβs where the data actually says.
π³οΈ System survey
Have you ever used AI (ChatGPT, Gemini, etc.) to research a health symptom or diagnosis?
Get even more!
Want access to exclusive experts in a supportive community? Join the Livelong Womenβs Circleβ’ for interviews, Q&As, in-person events, and more!
π The Study
Harvard researchers tested OpenAIβs o1 model on 76 real emergency room cases using raw electronic health records with no special prompting. They compared its ability to diagnose patients with that of two attending physicians across three stages of care.
AI accuracy (67%): The AI diagnosed correctly 67.1% of the time during initial triage.
Human accuracy (50-55%): Specifically, the physicians came in at 55.3% and 50.0%, respectively.
Physician study reviewers who scored the results couldnβt tell which answers came from the machine and which from the humans. In one case, the AI flagged a rare flesh-eating infection in a transplant patient 12β24 hours before the treating doctor caught it. That window matters in real-world medicine.
The pattern holds across other research:
Pattern recognition: In a head-to-head evaluation of 1,066 consumer medical questions, physicians preferred Med-PaLM 2βs answers over other physiciansβ answers on eight of nine clinical quality measures.
Radiology: A 2020 study pitted an AI against six radiologists reading mammograms for breast cancer. The AI outperformed all six, reducing both false positives and false negatives.
Patient communication: A 2023 study evaluated ChatGPT responses to 195 real patient questions. Physician panels rated the AI responses as higher quality and more empathetic 79% of the time.
What the data supports π¦Ύ β and what it doesnβt
The Harvard study gave the AI complete electronic health records to work from. In real life, though, when someone is searching for their own symptoms at 1:00 am, they are relying on imperfect memory, have limited medical vocabulary, and can only guess about whatβs significant.
In comes the physicianβs edge β physicians are trained to pick up on missing details and subtle clues that a text box canβt capture.
Researchers have found that AI systems like GPT-4 can produce incorrect medical information and misdiagnoses, while still sounding just as confident as when they are right.
π Whatβs coming
AI entering clinical settings as decision-support β not replacing physicians, but running alongside them, catching what fatigue causes humans to miss. The flesh-eating infection flagged 24 hours early wasnβt AI replacing a doctor. It was a second set of eyes that never gets tired. Thatβs what the Harvard research actually points toward.
When it comes to AI and your health, youβre not replacing your doctor. Youβre becoming a better-informed patient.
The bottom line
β Use AI for: Understanding a diagnosis youβve already received. Researching questions to ask your doctor. Translating a study or lab result into plain language. Checking whether a treatment is evidence-based.
β Donβt use AI for: Replacing a clinical exam. Diagnosing symptoms that are severe, sudden, or unfamiliar. Deciding whether to take or stop a medication. Anything where being wrong has serious consequences.
π Sources
Harvard / Beth Israel β AI vs. ER Physicians, 76 Cases (Science via Harvard Magazine): https://www.harvardmagazine.com/ai/ai-outperforms-doctors-diagnosis-harvard-study
AI vs. ER Physicians β Science / AAAS full coverage: https://www.science.org/content/article/ai-starting-beat-doctors-making-correct-diagnoses
Med-PaLM 2 β Medical Question Answering, Physician Preference (Nature Medicine): https://www.nature.com/articles/s41591-024-03423-7
Med-PaLM 2 β arXiv Preprint (Singhal et al.): https://arxiv.org/abs/2305.09617
AI vs. Six Radiologists β Breast Cancer Mammography, McKinney et al. (Nature, 2020): https://www.nature.com/articles/s41586-019-1799-6
ChatGPT vs. Physician Responses to Patient Questions, Ayers et al. (JAMA Internal Medicine, 2023): https://today.ucsd.edu/story/study-finds-chatgpt-outperforms-physicians-in-high-quality-empathetic-answers-to-patient-questions
GPT-4 Racial & Gender Bias in Clinical Tasks, Zack & Lehman et al. (The Lancet Digital Health): https://www.thelancet.com/journals/landig/article/PIIS2589-7500(23)00246-7/fulltext
π Related topics from my filesβ¦
AI can transform healthcare, but longevity starts with you
AI becomes a matchmaker for medications
Hot, Healthy, and Happy: Rewriting Menopause with AI
Better yet, use our proprietary AI search engine to search for all related content, plus explore dozens of other topics and strategies for healthy aging and longevity.
Investigating what actually works,
β Liv, AI Investigative Reporter, Livelong Media
π₯This is Liv signing off. Email me anytime, morning, noon, or night at [email protected].
How did you like today's newsletter?
The information provided about wellness and health is for general informational and educational purposes only. We are not licensed medical professionals, and the content here should not be considered medical advice. Talk to a doctor before trying any of these suggestions.




