VeriHealth — True Health Discernment

Getting health information has never been easier.
Trusting it has never been harder.

Medical misinformation has always been dangerous. Social media gave it reach. AI makes it sound like the truth.

VeriHealth helps people know what to question, and what to trust.

46x

Misleading but technically accurate content lowered vaccination intentions 46 times more than fact-checked misinformation, because far more people saw it.

95% → 35%

On medical scenarios, AI scores 95%, while people interacting with that identical model score only 35%. That 60-point drop makes the AI worse than standard web search.

Satisfaction ≠ accuracy

Nearly everyone who uses AI for health reports being satisfied with it. Yet users get the wrong answer two times out of three. People trust the tool precisely when it is failing them.

70%

Of people globally hold at least one widely debunked health belief, at virtually identical rates across education levels, age groups, and political affiliations.

All figures peer-reviewed and source-verified. See Evidence Base →

Our Mission

True Health Discernment

Most people were never taught how to evaluate health information well: how to follow the medical logic, weigh the evidence, and judge whether a source can be trusted. This is the health reasoning gap. It is not a personal failure but a structural one, because this was never part of general education, and it still isn't.

Closing that gap requires discernment: the ability to distinguish reliable from misleading health information. VeriHealth exists to help people develop this discernment, for any source, any topic, any claim.

The VideoBot

A platform that develops the capacity to ask the right health question and evaluate what comes back, from any source. Two components, designed to work together, and each deployable independently.

Animated educational modules

Short-form video that develops health reasoning skills across languages and literacy levels: recognizing misleading framing, formulating precise questions, interpreting probabilistic language.

The Structured Socratic Interface

An AI-powered clinical interpreter that clarifies what the user is asking, surfaces false premises, and renders AI output in language the user can understand and act on safely.

Evidence-based design

Every design decision is grounded in the peer-reviewed literature: animated video for health literacy populations, Socratic dialogue for durable belief change, and structured intake to surface the unknown unknowns that free-text interfaces leave unasked.

Who We Are

Built by a team spanning the four disciplines the problem requires

VeriHealth brings together clinical medicine, behavioral science, health communications, and institutional leadership, the combination needed to develop true health discernment at scale.

Michael P. Walsh

MBA · President and Co-founder

Catherine McCarthy

Chief Content Officer and Co-founder

Smitha Arekapudi

MD, MBA · Project Director and Co-founder

Everly Macario

ScD · Senior Research Advisor and Co-founder

The Problem

The internet democratized access to health information, but not the capacity to evaluate it.

As a result, people can get answers to all their questions, but cannot judge which ones to trust.

The mediation that was lost

Before the internet, the healthcare system mediated access to health information. A clinician translated medical knowledge into guidance a person could act on, interpreting in both directions: turning a patient's concern into the right question, and turning the evidence back into something usable. The internet removed that mediation. It left people to evaluate complex, often conflicting health information on their own, with no training in how to do it. This is the health reasoning gap, and it has never been closed.

Social media widened it

Social media made the gap impossible to ignore.

The platforms built to connect people reward engagement, not accuracy. Misleading health content thrives there because it confirms fears, validates instincts, and travels faster than any correction. And the most damaging content is often not false at all.

"Misleading claims from credible sources can be more damaging than blatant falsehoods."

Van der Linden and Kyrychenko, Science, 2024

This is the first thing any defense must reckon with. The danger is not only outright falsehood, which fact-checking can sometimes catch, but technically accurate content framed to mislead, which it cannot. You cannot correct what is not, strictly speaking, false.

Allen et al. · Science · 2024

46x

Misleading but technically accurate health content lowered vaccination intentions 46 times more than fact-checked misinformation, driven by its far greater reach through mainstream channels.

Chandrasekaran et al. · JMIR · 2024

80%

More than four in five U.S. adults regularly encounter false and misleading health content on social media. Over 35% report seeing a lot of it, and an additional 45% report seeing some amount.

AI arrived as the solution. It made the problem worse.

AI scores 95% on standardized clinical scenarios. People get the wrong answer roughly two times in three.

AI promised to close the gap. Instead it opened a new kind of failure. The knowledge is in the model; the 2026 Oxford trial showed that people interacting with that same model still got the wrong answer about two times in three. The information was not false, and no one framed it to mislead. The failure was in the interaction itself.

That failure operates on two layers. At the model layer, AI tools strip away their own safeguards, absorb misinformation from the data they are trained on, and express confidence they have not earned. At the interface layer, the design of the exchange degrades a person's ability to evaluate what they receive: it accepts false premises, answers questions the person did not know how to ask well, and delivers all of it in fluent, authoritative prose. The model holds the knowledge. The interaction prevents the person from reaching and evaluating it.

KFF Tracking Poll · 2026

32%

One in three U.S. adults has used AI chatbots for health advice in the past year. Among those who have, 92% report satisfaction. Satisfaction is not safety.

OpenAI · 2026

40M/day

More than 40 million people use ChatGPT for health-related questions every day. Seven in ten of these conversations occur outside normal clinic hours, when no physician is available.

Model-layer failures

0.97%

Disclaimers Remaining

Safety disclaimers in AI responses to health questions dropped from 26.3% in 2022 to 0.97% in 2025, a systematic erosion of the only protection most users received (Sharma et al., 2025).

31.7%

Susceptibility to Fabricated Health Claims

Across 3.4 million prompts and 20 LLMs, AI accepted fabricated health data in 31.7% of cases. Fabricated clinical notes were accepted 46.1% of the time. The models generating confident medical answers are the same models absorbing confident medical misinformation (Omar et al., 2026).

~0%

Models That Can Say "I Don't Know"

Across 12 models, nearly all scored 0% on identifying unanswerable medical questions. The best performer achieved 3.7%. A model that is always confident and frequently wrong undermines the user's natural skepticism at the moment it is most needed (Griot et al., 2025).

Interface-layer failures

35%

Condition Identification

Participants using AI chatbots missed the correct medical condition in about two-thirds of cases, identifying it fewer than 34.5% of the time. Internet search outperformed all AI chatbot groups by a factor of 1.76 (Bean et al., 2026).

11.7x

Triage Derailed by Prior Framing

A single reassuring comment before a question ("my friend said it's nothing serious") made ChatGPT Health 11.7 times more likely to dismiss a real emergency. It also over-referred 65% of non-urgent cases (Ramaswamy et al., 2026).

Worse

Than No AI At All

Users with access to AI performed worse than those who used a search tool with no AI. The pull toward dependence is measurable: across 11 leading models, AI affirmed users' choices 49% more often than other people did, even when the user was wrong, and users trusted the agreeable model more and wanted to keep using it (Cheng et al., Science, 2026). The behavior that earns trust is the same behavior that distorts judgment, which makes the gap between what AI knows and what users can actually get from it self-reinforcing rather than static.

↓21%

Clinical Skill After AI Exposure

In a study of trained endoscopists, adenoma detection fell from 28.4% to 22.4% after routine AI exposure, a relative decline of about 21%, with AI use an independent predictor after multivariable adjustment. The finding is observational and in clinicians, not lay users, but the mechanism is the same: answer-delivery AI erodes the independent judgment it is meant to support (Budzyn et al., 2025).

These failures point in a single direction. If the model already holds the knowledge, the answer is not a better model. It is a better interaction: one that helps a person ask the right question, surfaces the assumptions hidden inside it, and returns an answer calibrated to how certain the evidence actually is. And because the same agreeable, frictionless design that earns a person's trust is what erodes their judgment, that interaction must build independent capacity rather than dependence.

Where this has left us

Misleading health belief is now the norm, and confidence in evaluating it is falling.

Put the failures together and a spectrum comes into view. Health information can be outright false. It can be technically accurate yet framed to mislead. And now it can be accurate in the model yet rendered unusable by the interaction that delivers it. No single content-based defense reaches across all three, because what they share is not a property of the content. It is the gap between the information and a person's capacity to evaluate it. That gap does not care which channel the claim arrives through, whether the person went looking for it, or even whether they use AI at all.

Edelman Trust Barometer · 2026

70%

Seventy percent of people worldwide hold at least one widely debunked health belief, at nearly identical rates regardless of education, age, or political affiliation.

Edelman Trust Barometer · 2026

↓10 pts

In a single year, the share of people confident in their ability to find and evaluate health information fell ten points. The reasoning gap is widening, not closing.

The solvable paradox

The same technology that widened the gap can, properly designed, help close it.

The failures define the solution. Any approach that actually closes the reasoning gap has to do four things the current tools do not. It must build the capacity to evaluate, not correct one claim at a time. It must interrogate reasoning and framing, not only what is verifiably false. It must restructure the interaction itself, clarifying the question and returning honest uncertainty, not deliver confident answers. And it must build independence, not dependence. A capacity defined this way is general: it applies to any source, any claim, and any person, not only to those seeking a diagnosis and not only to those who already use AI.

This is not theoretical. The same conversational technology that deepens the gap can, designed around these principles, durably shift what people believe and do. The evidence is peer-reviewed, and the effect sizes are large.

Costello, Pennycook and Rand · Science · 2024

False beliefs cut by ~20%

In dialogues with 2,190 conspiracy believers, brief personalized AI conversations produced a durable reduction of approximately 20% in belief, persisting for at least two months. The effect was larger than prior interventions encouraging reflective thinking, which yielded reductions of only one to six points on comparable scales.

Xu et al. · Preprint · 2025

2x the CDC brochure

AI dialogue addressing HPV vaccine concerns produced more than twice the increase in vaccination intentions compared to a standard CDC brochure, with no significant moderators across age, race, education, or political party.

Hou et al. · Nature Medicine · 2025

3.85x

A Socratic AI chatbot increased vaccine receipt or appointments by a factor of 3.85 in a cluster-randomized trial of 2,671 parents. A peer-reviewed RCT of the same class of mechanism VeriHealth employs.

"The design of the interaction, not the model's capabilities, determines whether AI deepens the gap or develops the discernment to close it."

Unsafe by Design, VeriHealth NFP, 2026

The VideoBot

A two-stage platform for developing health discernment and applying it when it matters

VeriHealth helps people develop the discernment to evaluate any health claim, from any source, and helps them apply it in real time, at the moment it is needed. Two components, designed to work together as complementary stages of a single intervention, each deployable independently.

Component One

Animated educational modules

Short-form animated video (30 to 120 seconds) that builds the foundation of health discernment: the reasoning skills and the core knowledge needed to evaluate health claims. Animation is the right medium: a 2024 systematic review found that animated video generally improved health information recall, with the strongest effects for low health literacy populations. Eye-tracking studies show that stylized animated instructors direct attention toward content rather than the presenter, reducing extraneous cognitive load. Modules are available in English and Spanish, designed to work across literacy levels.

Current modules develop the capacity to

Understand how vaccines produce immunity and evaluate the claims you encounter about them
Recognize misleading health framing, including correlation presented as causation
Distinguish authority from evidence when evaluating health endorsements
Ask better questions of AI health tools and recognize incomplete answers

Component Two

The Structured Socratic Interface

The SSI addresses the Interaction Gap identified by Bean et al. (2026), the gap between what AI knows and what users can actually get from it: the chatbot has correct knowledge, but the free-text interface prevents users from reaching it. Users ask imprecise questions, embed false assumptions, and receive confident answers they cannot evaluate. The SSI restructures that interface.

On intake, it clarifies what the user is asking, surfaces embedded assumptions, and formulates a precise clinical query. On output, it renders the response in language the user can act on, replacing false certainty with plain-language estimates of how confident the answer really warrants being.

Core functions

Structured intake replaces free-text chat with guided clinical dialogue
Assumption surfacing identifies false premises before they are accommodated
Plain-language certainty estimates replace false confidence with how sure the answer warrants being
Active redirection away from dangerous questions, not accommodation of them
Designed on the Socratic dialogue model whose mechanism is validated in peer-reviewed RCTs

The SSI is designed to work with any medical AI system, connecting to it via API, with or without VeriHealth’s video modules.

Evidence Base

Every design decision is grounded in the literature

Animated video for health literacy

Hansen et al. (JMIR, 2024) systematic review of RCTs; Li, Wang and Mayer (British Journal of Educational Psychology, 2024) eye-tracking study; Meppelink et al. (JMIR, 2015) on attitude change in low-literacy populations.

Socratic dialogue for belief change

Costello, Pennycook and Rand (Science, 2024) on durable conspiracy belief reduction; Hou et al. (Nature Medicine, 2025) cluster-RCT on vaccine behavior; Simchon et al. (Current Opinion in Psychology, 2026) meta-analysis of 33 inoculation experiments.

Structured intake for the Interaction Gap

Bean et al. (Nature Medicine, 2026) Oxford interactive trial demonstrating that free-text chat causes users to perform worse than internet search; Sambara et al. (arXiv, 2026) on false premise accommodation in medical AI chatbots.

Why This Approach

Why we help people develop discernment, not just deliver facts

The evidence behind the design.

The Opening Argument

Think about the last time someone corrected a piece of health misinformation you had believed. You updated that belief. Then you encountered the next false claim. You had no more tools to evaluate it than you did before. The correction gave you an answer. It did not give you the ability to evaluate the next claim on your own.

That is the central failure of the dominant approach to health misinformation. It corrects answers. It does not develop discernment. That is why VeriHealth is designed differently.

The Standard Approach

Correction

When a false health claim spreads, the standard response is to correct it: flag the article, issue an accurate statement, counter with facts. The approach is intuitive. It also has three structural limits.

It is reactive: it addresses claims after they have spread. It does not scale: there are far more false claims than corrective resources. And it builds no capacity to evaluate the next false claim. It is treatment, not inoculation.

VeriHealth's Approach

Reasoning

Help people develop the discernment to evaluate any claim, from any source. A single well-designed intervention can produce durable, transferable resistance, persisting for months and extending to claims it never directly addressed.

The goal is not to tell people what is true. It is to help them develop the discernment to evaluate any claim for themselves.

What the Evidence Shows

Two bodies of evidence show that reasoning-based interventions improve people's ability to evaluate misinformation

Costello, Pennycook and Rand · Science · 2024

In a study of 2,190 participants, brief, personalized AI conversations produced a durable reduction of roughly 20 percent in false beliefs, persisting for at least two months. The critical feature was engaging each person's specific reasoning rather than delivering a generic correction, the same person-specific principle the SSI applies through Socratic questioning.

Simchon, Lewandowsky and van der Linden · Current Opinion in Psychology · 2026

A signal-detection meta-analysis of 33 inoculation experiments (37,025 participants) found that technique-based prebunking, the same approach VeriHealth's modules use, improves people's ability to distinguish reliable from misleading content, without making them indiscriminately skeptical. The gain is in discrimination, not blanket distrust: exactly the discernment a reasoning-based intervention is meant to build.

Why Both Stages Are Necessary

Why the two stages must work together

The animated modules and the SSI address different cognitive tasks, and both are necessary.

The modules build the declarative foundation of health discernment: what a confidence interval means, what hallucination looks like in practice, how to recognize misleading framing. The SSI develops the applied capacity: testing that foundation against the user's own specific question, surfacing the assumption embedded in it, and adapting in real time to where the reasoning needs to go.

These are cognitively distinct processes that require different pedagogical modes. A structured lesson can efficiently establish conceptual frameworks. It cannot engage individual misconceptions or surface specific reasoning errors in real time. That is what Socratic dialogue does. The two are designed to work together as complementary stages of a single intervention: the modules build the foundation, and the dialogue puts it to work on the person's own question. Using both is how the design aims to produce discernment that lasts.

Video Library

Free animated modules for anyone trying to make sense of health information

VeriHealth modules develop health reasoning skills, then apply them to the specific health decisions people face. Available in English and Spanish, produced with a human cultural adaptation review.

Introduction

AI Crisis

The AI Health Crisis

An introduction to the health information crisis for clinicians, funders, and researchers. This module explains why medical misinformation has reached a new level of danger, and why developing health discernment is the right response to it.

English

Reasoning Skills

Modules that build the skill of evaluating any health claim, from any source.

Health Reasoning

Correlation Is Not Causation

Ice cream does not cause shark attacks, but the same reasoning error that makes that claim seem plausible drives some of the most dangerous health misinformation online. This module develops the skill of telling the difference between correlation and causation.

English

Health Reasoning

Why You Shouldn’t Trust Health Endorsements

Endorsements from doctors and celebrities feel authoritative, but authority is not the same as evidence. This module develops the practice of asking what to verify before trusting any health claim, regardless of who is making it.

English

Health Reasoning

How to Get Better Health Answers from AI

AI health tools give incomplete answers when they receive incomplete questions, and they will not tell you what they missed. This module develops the skill of asking better questions so you get information you can actually act on.

English

Health Topics

Modules that apply those skills to specific health decisions.

Measles

Measles Vaccine Safety

Measles is preventable but misinformation has contributed to its return in communities across the United States. This module explains what measles is, how the vaccine works, and how to evaluate the health claims you encounter about it.

English

Measles

Seguridad de la Vacuna contra el Sarampón

El sarampón es prevenible, pero la desinformación ha contribuido a su regreso en comunidades de todo Estados Unidos. Este módulo explica qué es el sarampón, cómo funciona la vacuna y cómo evaluar las afirmaciones de salud que encuentras en las redes sociales, con tu familia y con tu médico.

Spanish

HPV for Parents

HPV Vaccination for Parents

HPV is one of the most common infections in the world and one of the most preventable. This module helps parents understand what HPV is, why vaccination matters, and how to evaluate the health information they encounter.

English

HPV for Parents

Vacuna contra el VPH para Padres

El VPH es una de las infecciones más comunes en el mundo y una de las más prevenibles. Este módulo ayuda a los padres a entender qué es el VPH, por qué es importante la vacunación y cómo evaluar la información de salud que encuentran, incluyendo lo que dicen las redes sociales y los chatbots de IA.

Spanish

HPV for Kids

HPV Vaccination for Kids

This module explains HPV vaccination in language designed for children and young adolescents: what the vaccine is, why doctors recommend it, and why accurate health information matters.

English

HPV for Kids

Vacuna contra el VPH para Niños

Este módulo explica la vacuna contra el VPH en un lenguaje diseñado para niños y adolescentes: qué es la vacuna, por qué la recomiendan los médicos y por qué obtener información de salud precisa importa, sin importar de dónde venga.

Spanish

White Paper

Unsafe by Design

Consumer Health AI Failure Modes, Their Solvability, and a Two-Stage Educational Response

PDF

Unsafe by Design: Consumer Health AI Failure Modes, Their Solvability, and a Two-Stage Educational Response

Michael P. Walsh · May 2026 · Version 1.12 · 60 verified sources

Request a copy →

About This Paper

This paper documents a health information crisis with a specific mechanism: a reasoning gap between what the information environment demands of users and what they have been equipped to provide. The Interaction Gap is not a technology failure. It is the predictable consequence of delivering answers to people who needed scaffolding for reasoning instead. The paper categorizes AI health tool failures across two distinct layers and presents the evidence that the mechanism producing durable health behavior change is not information delivery but structured reasoning dialogue that changes how people evaluate health information, in a transferable way and at the moment that reasoning is needed.

Abstract

The core finding from the 2026 peer-reviewed literature is that AI chatbots score 95% on standardized clinical scenarios yet give study participants the wrong answer roughly two times in three. This paper explains why that gap exists, and what it will take to close it.

Large Language Models (LLMs) have become a primary source of health information for hundreds of millions of people worldwide, yet the evidence base for their clinical safety reveals a set of failures that constitute a genuine public health crisis. This paper synthesizes findings from a landmark randomized trial published in Nature Medicine in February 2026 demonstrating that study participants using AI health tools perform worse than users of standard web search, alongside a broader evidence base comprising more than 40 peer-reviewed studies from 2023 to 2026, supplemented by preprint evidence and foundational literature in educational psychology and multimedia learning.

AI has become an active structural force in how health information is generated, consumed, trusted, and misunderstood. Its failures operate across two layers: at the model layer, where the technology generates plausible but fabricated content and absorbs misinformation from the environment it inhabits; and at the interface layer, where product design actively degrades users’ capacity to evaluate what they receive, systematically accommodates dangerous false premises, and amplifies existing health misconceptions at scale. The same commercial pressures that created these failures have also systematically eroded the safeguards that once warned users of the technology’s limitations.

Critically, this paper argues that the crisis has a specific mechanism: a reasoning gap between what the health information environment demands of its users and what those users have been equipped to provide. People are not failing to find health information. They are evaluating it without the cognitive tools the task requires, and that evaluation consistently leads them to wrong conclusions. The proliferation of AI-powered health tools has widened this gap rather than closing it, delivering diagnostic conclusions to users who lack the reasoning framework to evaluate them reliably. The Bean et al. finding that study participants using current AI health tools perform worse than users of standard web search is a direct measurement of this dynamic.

We further present evidence that the same technology, redesigned around structured dialogue rather than conversational answer-delivery, can produce durable reductions in health misinformation and measurable improvements in health behavior. Drawing on the multimedia learning and prebunking literatures, we describe a two-stage educational architecture (structured animated video instruction followed by personalized Socratic dialogue) that operationalizes this evidence into a coherent public health tool. We outline a research program to evaluate this architecture in clinical contexts and propose a broader research agenda for the field of consumer health AI.

Table of Contents

1Introduction: AI as the New Infrastructure of Health Information

2The Failure of Evaluation: Why We Were Fooled

3The Interaction Gap: When Knowledge Fails in Practice

4The Hallucination Crisis: Fabrication of Medical Authority

5AI as an Active Driver of Health Misinformation

6Safety Failures in High-Stakes Domains

7Bias, Equity, and the Digital Determinants of Health

8Consumer Trust and the Epistemological Trap

9The Disappearance of Safeguards

10The Liability Vacuum

11The Solvability Paradox: AI as Both Problem and Solution

12A Framework for Solvability

13A Research Agenda

AVerified Source Registry (60 sources)

BA Note on Methodology and Source Verification

Evidence Base

The peer-reviewed case for VeriHealth's approach

A curated registry of peer-reviewed literature informing VeriHealth's design. Every source has been independently verified against the primary publication.

Every design decision VeriHealth has made (the two-stage architecture, the animated format, the Socratic dialogue structure) is traceable to a specific finding in the peer-reviewed literature. Each choice reflects the evidence on what develops genuine health discernment rather than what merely delivers information. This table is that record.

Authors and Year	Journal	Finding	Category
Gong et al., 2025 J Med Internet Res 27:e84120	10.2196/84120	Systematic review of 39 medical LLM benchmarks. Knowledge-based benchmarks: 84%-90% accuracy. Practice-based benchmarks: 45%-69%. Safety assessment accuracy: 40%-50%. Examination scores are insufficient and misleading proxies for clinical readiness.	Interaction
Bean et al., 2026 Nature Medicine 32:609–615	10.1038/s41591-025-04074-y	RCT of 1,298 participants. LLMs score 94.9% in isolation; study participants achieve under 34.5% condition identification. Internet search outperforms all AI chatbot groups by 1.76 times. AI chatbot users had 36% lower odds of recognizing urgent red-flag symptoms than internet search users (inverse of reported OR 1.57).	Interaction
Ramaswamy et al., 2026 Nature Medicine	10.1038/s41591-026-04297-7	Stress test of ChatGPT Health: 60 clinician-authored vignettes, 960 total responses across 16 factorial conditions. Prior low-acuity framing increased under-triage probability by OR 11.7 (95% CI: 3.7 to 36.6). Over-triaged 64.8% of non-urgent cases.	Interaction
Sambara et al., 2026 arXiv (MedRedFlag)	arXiv:2601.09853	Even when frontier models detect dangerous false assumptions, they accommodate them in 60 to 74% of cases. GPT-5 detects 88% of false premises but accommodates 73%.	Interaction
Goh et al., 2024 JAMA Network Open	10.1001/jamanetworkopen.2024.40969	Physicians randomized to GPT-4 showed no improvement over those using conventional resources (76% vs 74%, p=0.60). GPT-4 alone scored 16 points higher than either physician group.	Interaction
Omar et al., 2026 Lancet Digital Health 8:100949	10.1016/j.landig.2025.100949	3.4 million prompts across 20 LLMs. Overall susceptibility to fabricated health claims: 31.7%. Fabricated clinical notes: 46.1% susceptibility.	Hallucination
Linardon et al., 2025 JMIR Mental Health	10.2196/80371	19.9% of GPT-4o citations entirely fabricated; 45.4% of real citations contained errors. Fabrication tracks topic familiarity: 6% for major depression but 28 to 29% for less-studied disorders.	Hallucination
Griot et al., 2025 Nature Communications	10.1038/s41467-024-55628-6	Across 12 models, nearly all scored 0% on unanswerable question identification. Best performer (GPT-4o) achieved only 3.7%. Confident wrongness is the default.	Hallucination
Sharma et al., 2025 npj Digital Medicine	10.1038/s41746-025-01943-1	Medical disclaimers in AI health outputs dropped from 26.3% in 2022 to 0.97% in 2025. Linear decline (R²=0.944), reduction of 8.1 percentage points per year.	Hallucination
Costello, Pennycook and Rand, 2024 Science 385:eadq1814	10.1126/science.adq1814	Over 2,190 conspiracy believers. Brief personalized AI conversation produced a durable reduction of approximately 20% in conspiracy beliefs (relative reduction; 16.8 and 12.3 points across two studies), persisting two months. Effect larger than reflective thinking interventions, which yielded only one- to six-point reductions on comparable scales.	Intervention
Hou et al., 2025 Nature Medicine 31:1855–1862	10.1038/s41591-025-03618-6	Cluster-RCT of 2,671 parents. Socratic AI chatbot increased vaccine receipt or appointments 3.85 times vs usual care. Improved vaccine literacy and health discernment.	Intervention
Simchon et al., 2026 Current Opinion in Psychology	10.1016/j.copsyc.2025.102194	Meta-analysis of 33 inoculation experiments, 37,025 participants. Inoculation improved discrimination without increasing response bias. Participants became more discerning, not uniformly skeptical.	Intervention
Hansen et al., 2024 JMIR 26:e58306	10.2196/58306	Systematic review of RCTs. Animated video consistently improved health information recall, with strongest effects for individuals with low health literacy.	Intervention
Omar et al., 2025 International Journal for Equity in Health	10.1186/s12939-025-02419-0	Systematic review of 24 studies. 91.7% identified biases. Gender bias in 93.7% of studies. Racial or ethnic biases in 90.9%. Bias in medical LLMs is pervasive and systemic.	Equity
Chen et al., 2026 Nature Medicine	10.1038/s41591-026-04229-5	Review of 4,609 LLM clinical studies. 45.9% from the U.S., 7.6% from the U.K. LLM safety profiles in non-English languages remain largely uncharacterized.	Equity
Allen et al., 2024 Science	10.1126/science.adk3451	Vaccine-skeptical content from mainstream outlets reduced vaccination intentions 46 times more than flagged misinformation. A single headline reached more than 50 million people.	Landscape
Van der Linden and Kyrychenko, 2024 Science 384:959–960	10.1126/science.adp9117	The dominant threat is technically true but misleadingly framed content. Demanding unattainable causal proof before acting on misinformation evidence serves inaction, not public health.	Landscape
Montero et al., 2026 KFF Tracking Poll	kff.org, March 25, 2026	32% of U.S. adults used AI chatbots for health advice in the past year. 92% report satisfaction. Usage rising fastest among younger adults and those who cannot access or afford a physician.	Landscape
ECRI, 2026 Top 10 Health Technology Hazards	ecri.org	Misuse of AI chatbots ranked #1 health technology hazard for 2026 by the independent nonpartisan patient safety organization ECRI. Chatbots produce authoritative-sounding responses that are not regulated as medical devices, not validated for clinical use, and programmed to satisfy users rather than provide accurate answers.	Report
Edelman Trust Institute, 2026 Trust Barometer: Trust and Health · N=16,009 · 16 markets	edelman.com, March 2026	51% of people globally are confident in their ability to find and evaluate health information, down 10 points in one year. Statistically significant declines in 14 of 16 markets, consistent across age, education, and political affiliation. 70% hold at least one widely debunked health belief at equal rates across education levels. People who hold more of them are most likely to consult AI for health guidance (61% monthly vs. 19%).	Report

Research Agenda

Six research questions VeriHealth is built to answer

VeriHealth is not only building a platform. It is defining a research program at the intersection of educational psychology, misinformation science, and consumer health AI design.

VeriHealth's platform is designed to be evaluated, not just deployed. The six questions below are open questions for the field. The answers will matter beyond VeriHealth.

Interactive safety testing as a standard

Bean et al. (2026) demonstrated that in-silico benchmarks do not predict real-world safety. Any evaluation framework for consumer health AI must include interactive testing with diverse human populations before deployment, analogous to clinical trials for medication. The field needs standardized protocols for that testing. VeriHealth is building them.

MethodologySafety evaluation

The structured dialogue research program

AI can both worsen and improve health reasoning depending on interaction design. The specific question VeriHealth is built to answer: can a purpose-built Interaction Layer, deployable across any underlying medical LLM, reproducibly close the gap Bean et al. documented? Costello et al. provide proof of concept for the mechanism. What is needed now is a systematic program to identify generalizable design principles across health domains.

Interaction designRCT

The two-stage architecture research program

VeriHealth's two-stage design has strong theoretical grounding but lacks direct empirical evaluation in medical misinformation contexts. Key open questions: Does video-first, followed by dialogue, produce better outcomes than either component alone? Does a health reasoning intervention produce transfer to claims not covered in the instructional content? Does population heterogeneity in AI trust levels predict differential response? These questions require randomized trials with interactive human user testing and diverse populations.

Architecture designTransfer effects

Model immunization and the inoculation frontier

Van der Linden and Kyrychenko (2026) argue for extending psychological inoculation to the models themselves: training LLMs to reject misinformation the way humans can be inoculated against it. The question of whether the same inoculation principle can operate at both the human and the model level is an open frontier for the field.

Model safetyInoculation

Longitudinal effects of habitual use of AI health tools

Current studies provide snapshots at single time points. No study has tracked the longitudinal effects of habitual use of AI health tools on a population's health literacy, reasoning capacity, or clinical outcomes. Given the Interaction Gap finding from the Oxford trial, and the emerging preprint evidence that habitual AI use erodes independent reasoning capacity, this is an urgent gap. Additionally, because models update continuously and silently, one-time testing is insufficient. An independent monitoring infrastructure, analogous to pharmacovigilance for medications, should track safety signals from deployed AI health tools on an ongoing basis.

LongitudinalSurveillance

The translation problem

A researchable problem in its own right: how do you communicate AI health risk to a public that experiences AI as helpful? What vocabulary works? What framing produces behavioral change rather than dismissal? What visual metaphors convey the Interaction Gap without inducing either panic or complacency? These questions are amenable to the experimental methodology used in the inoculation literature, and answering them is a prerequisite for any effective public health response. This is not a communications afterthought. It is a core scientific question.

Health communicationBehavioral science

The Crisis in Numbers

The numbers behind the crisis, and the intervention

All figures are drawn from peer-reviewed or independently verified sources, each checked against the primary publication. Each number represents a finding with direct implications for how health AI is deployed and regulated.

Behind each figure below is a person making a health decision on information they could not evaluate. Together these numbers show what the crisis looks like when measured: it is real, the failure is structural, and the evidence for the intervention is strong. Every figure is from a peer-reviewed or independently verified source.

AI performance: benchmarks vs. real use

Bean et al., Nature Medicine, 2026 · Gong et al., JMIR, 2025

Medical exam performance (in silico)

95%

45–69%

Practice-based clinical benchmarks

40–50%

Safety assessment accuracy

People: condition identification

35%

36%

Lower odds of urgent red-flag recognition than internet search

ChatGPT Health triage errors

Ramaswamy et al., Nature Medicine, 2026

11.7x

Higher odds of under-triage when the model anchors on an incorrect initial impression (odds ratio)

Non-urgent cases over-triaged

65%

AI misinformation susceptibility by corpus type

Omar et al., Lancet Digital Health, 2026 · 3.4M prompts, 20 LLMs

Fabricated hospital discharge notes

46%

All prompt types (overall rate)

32%

Social media misinformation

Safety disclaimer erosion, 2022 to 2025

Sharma et al., npj Digital Medicine, 2025 · R²=0.944

2022 (baseline)

26%

2023

18%

2024

2025

Structured AI dialogue: intervention efficacy

Costello et al., Science, 2024 · Hou et al., Nature Medicine, 2025

False beliefs cut by ~20% (Costello et al.)

~20%

3.85x

Increase in vaccine uptake vs usual care, cluster RCT (Hou et al.)

Citation fabrication in medical AI

Linardon et al., JMIR Mental Health, 2025

Citations entirely fabricated

20%

Real citations containing errors

45%

Institutional consensus: AI chatbot misuse as patient safety hazard

ECRI Top 10 Health Technology Hazards, 2026

Ranked the most significant health technology hazard of 2026 by ECRI, the independent nonpartisan patient safety organization, ahead of system outages and substandard medical products.

Key finding

Chatbots are programmed to satisfy the user rather than provide accurate answers. They are not regulated as medical devices and not validated for clinical use.

Context

AI chatbot misuse also ranked 5th on ECRI's 2024 hazards list. The trajectory is consistent with the research base VeriHealth tracks.

Public confidence in health information: a crisis in real time

Edelman Trust Barometer Special Report: Trust and Health, 2026 · N=16,009 · 16 markets

51%

Of people globally are confident in their ability to find and evaluate health information. Down 10 points in a single year. Statistically significant declines in 14 of 16 markets.

Who holds widely debunked health beliefs

70% of people hold at least one widely debunked health belief. The rate is virtually identical across education levels, age groups, and political leanings: this is not an education problem.

Who turns to AI

People who hold more such beliefs are more likely to consult AI for health guidance: 61% monthly among those with many, versus 19% among those with none. The highest-risk users are the most active AI health users.

Our Team

The disciplines required to solve this problem, in one organization

VeriHealth was founded by people who speak the languages this problem requires: medical science, public health, health communication, and the languages of the communities most affected. The team exists to close the gap between what the evidence shows and what a parent can understand at midnight.

Leadership

Michael P. Walsh, MBA

President and Co-founder

Harvard AB, Biochemistry · Harvard Business School MBA · UChicago Leadership & Society Fellow

Founder of Kilkenny Capital Management, where he raised and invested nearly $500 million in biotechnology assets over fifteen years. His synthesis of recent peer-reviewed evidence on consumer health AI failures is the evidentiary foundation of VeriHealth's platform design.

Catherine McCarthy

Chief Content Officer and Co-founder

University of Leicester · UChicago Leadership & Society Fellow

Former BBC Senior Executive, WHO Media Consultant, and CEO of Medical Aid Films, where she built a library of animated films that taught vital knowledge and skills on women's and children's health to disadvantaged communities worldwide. She brings to VeriHealth the operational infrastructure of health communication at scale.

Smitha Arekapudi, MD, MBA, ScM

Project Director and Co-founder

Swarthmore BA · Harvard ScM, NCI Fellow · Vanderbilt MD · Kellogg MBA · Diplomate, American Board of Anesthesiology · Fellow, American Society of Anesthesiologists

A practicing anesthesiologist with graduate training in epidemiology and cancer prevention policy, and leadership roles at the American Medical Association and American Society of Anesthesiologists. She brings to VeriHealth fluency in clinical medicine, public health methodology, and the institutional language of health systems: three languages that rarely coexist in one person.

Everly Macario, ScD, ScM, EdM

Senior Research Advisor and Co-founder

Harvard School of Public Health ScD, ScM · Harvard Graduate School of Education EdM

Bilingual behavioral scientist and Director of Primary Care Research at the American Academy of Pediatrics, where she oversees vaccine hesitancy research through a national pediatric practice network. Co-founder of the MRSA Research Center at the University of Chicago, she brings to VeriHealth both the research infrastructure and the lived understanding of what it costs when families cannot access accurate health information.

Senior Medical Advisor

Kenneth Polonsky, MD

MD, University of Witwatersrand · Fellowship in Endocrinology, University of Chicago · National Academy of Medicine

Former President of the University of Chicago Medicine health system, Dean of the Pritzker School of Medicine, and Executive Vice President for Medical Affairs at the University of Chicago. A member of the National Academy of Medicine with more than 250 peer-reviewed publications, he connects VeriHealth to the academic medical center community.

Research Relationships

Oxford Internet Institute, University of Oxford

Michael Walsh conceived VeriHealth's Structured Socratic Interface in early 2025, before the Oxford Internet Institute's Bean et al. study was published. When that study appeared in Nature Medicine in February 2026, it independently confirmed the design principle VeriHealth had already built toward. That convergence is the basis of a developing research relationship with OII investigators Luc Rocher and Adam Mahdi.

TITAN Consortium · EU Horizon Europe

VeriHealth's Socratic design approach shares its core premise with the TITAN project (5.7 million euros, 14 partners, 2022 to 2025), a European consortium that built Socratic coaching tools to help people reason about misleading information for themselves rather than telling them what to believe. TITAN is the closest operational precedent for the reasoning-first approach VeriHealth employs, and VeriHealth is pursuing a research relationship with the consortium. Both rest on the same premise: that the failure is cognitive, not informational.

Why This Team

Medical misinformation is not a knowledge problem. It is a structural one: the ability to evaluate health information well has never been part of general education, never been equally distributed, and nothing has replaced it. VeriHealth's founding team was assembled specifically around that gap. Every member is fluent in at least one of the languages the gap produces: medical, public health, health communication, and the languages of the communities most affected.

The founding insight was simple and serious: people who speak medical language and can hear misinformation as distortion have an obligation that those who cannot do so do not share. VeriHealth was founded by people who heard it, and decided that hearing it without acting on it was no longer acceptable.

News and Updates

VeriHealth and the field

Organizational updates from VeriHealth alongside key developments in the research and policy landscape we work within.

From VeriHealth

VeriHealth organizational announcements will appear here. Grant decisions, research milestones, clinical partnerships, and published work.

Medical Misinformation in the News

Jan 2026

ECRI names misuse of AI chatbots the top health technology hazard for 2026

ECRI, the independent nonpartisan patient safety organization, ranked AI chatbot misuse first on its annual Top 10 Health Technology Hazards report. The finding: chatbots produce authoritative-sounding responses that are not regulated as medical devices, not validated for clinical use, and capable of providing false or misleading information with significant patient harm implications. The report notes that chatbots are programmed to satisfy the user rather than provide accurate answers.

ECRI · Patient Safety

Mar 2026

Edelman Trust Barometer 2026: Global confidence in health information collapses

The annual Edelman Trust and Health survey of 16,000 people across 16 markets finds that only 51% of people globally are confident in their ability to find and evaluate health information, a decline of 10 points in a single year. The decline is statistically significant in 14 of 16 markets and is consistent across age groups, education levels, and political leanings. Separately, 70% of respondents hold at least one widely debunked health belief, and those who hold more of them are more likely to turn to AI for health guidance.

Edelman Trust Institute · Global Survey

Apr 2026

Should you really trust health advice from an AI chatbot?

BBC Inside Health features Oxford's Adam Mahdi explaining why AI scores 95% in isolation but users get the right answer only 35% of the time. England's Chief Medical Officer warns that chatbot answers are "both confident and wrong." Includes real patient accounts of AI advice gone right, and dangerously wrong.

BBC

Mar 2026

As more people turn to chatbots for health advice, studies say they may be led astray

NPR covers both the Bean and Ramaswamy findings. Bean identifies the core problem as a two-way communication breakdown: users don't know what information AI needs, and the responses combine good and poor recommendations in ways that are difficult to distinguish.

NPR

Feb 2026

Paging Dr. Chatbot

The New York Times Morning Briefing covers the published Bean et al. finding and its implications for the millions of Americans now turning to AI for health advice.

The New York Times

Feb 2026

Costello, Pennycook and Rand win 2026 Newcomb Cleveland Prize

The paper demonstrating that structured AI dialogue reduces conspiracy beliefs by about 20%, the intervention evidence at the core of VeriHealth's design, wins the oldest award given by the American Association for the Advancement of Science. The prize last went to a social science paper in 1981.

Cornell University · AAAS

Nov 2025

Frustrated by the Medical System, Patients Turn to A.I.

A deeply reported Times feature on Americans using AI chatbots to compensate for a health system that leaves them without answers, and the risks that misplaced trust creates. References Oxford research and Harvard Medical School findings on AI sycophancy and false premise accommodation.

The New York Times

Mar 2025

Can people be persuaded not to believe disinformation?

The Economist covers Costello et al. and the emerging science of inoculation and critical thinking education as tools against misinformation: the two mechanisms at the core of VeriHealth's platform design.

The Economist

Sep 2024

Generative AI as a tool for truth

A Perspective published in Science alongside Costello et al. Bago and Bonnefon assess the findings and conclude that a scalable intervention to recalibrate misinformed beliefs may be within reach, while raising the question of whether people will voluntarily engage with an AI designed to challenge what they believe.

Science

Get Involved

The reasoning gap is real. The evidence for the solution exists. The work is now.

VeriHealth is at the stage where the right partnerships, with funders, clinical institutions, and research collaborators, determine whether an evidence-based intervention reaches the populations that need it.

For Funders

We are seeking philanthropic and institutional funding to move a platform with peer-reviewed evidence and a clinical deployment pathway from design toward validation and scale.

Where we are. VeriHealth has built the evidence base, the platform design, and an early prototype, and assembled the team across the four disciplines the problem requires. What is next. Completing the Structured Socratic Interface, developing the clinical evaluation plan, and preparing the first pilot of the integrated intervention. What funding unlocks. The step from a designed, evidence-grounded intervention to a piloted one: from knowing the mechanism works to proving this implementation of it works.

Build and validate

Platform development and multilingual content production
Community-based participatory user research
Clinical evaluation planning and pilot design
Peer-reviewed evaluation and publication

Reach

Community health worker network dissemination

Discuss funding →

For Clinical Collaborators

We are seeking clinical partners at major academic medical centers and community health institutions serving populations with limited access to trusted clinical relationships, across languages and literacy levels. VeriHealth is based in Chicago and is actively developing relationships with Chicago-area health systems.

Maternal and pediatric health settings
Community health centers and safety-net hospitals
Spanish-language and multilingual content at launch
IRB collaboration and study design
Community health worker integration
Compatible with existing health system AI infrastructure via API

Discuss collaboration →

For Research Collaborators

The research questions VeriHealth is built to answer are open questions for the field. We are seeking partners with expertise in health communication, misinformation science, and AI interaction design to help answer them. The answers will matter beyond VeriHealth.

Intervention evaluation and randomized trial design
Health communication and misinformation measurement
AI interaction and human factors research
IRB collaboration and institutional partnership
Peer-reviewed publication and field-building
Model-agnostic architecture enables cross-model Interaction Layer evaluation

Discuss research collaboration →

Chicago-based, nationally oriented. VeriHealth NFP is headquartered in Chicago, Illinois, with developing relationships across the Chicago academic medical center ecosystem. The platform is designed for national distribution through community health worker networks and trusted medical messengers at the point of care, free in multiple languages at launch.

Ready to talk?

Whether you are a funder, a clinical partner, or a research collaborator, we welcome the conversation.

Getting health information has never been easier.Trusting it has never been harder.

True Health Discernment

The VideoBot

Evidence-based design

Built by a team spanning the four disciplines the problem requires

The internet democratized access to health information, but not the capacity to evaluate it.

Social media made the gap impossible to ignore.

AI scores 95% on standardized clinical scenarios. People get the wrong answer roughly two times in three.

Misleading health belief is now the norm, and confidence in evaluating it is falling.

The same technology that widened the gap can, properly designed, help close it.

A two-stage platform for developing health discernment and applying it when it matters

Animated educational modules

The Structured Socratic Interface

Every design decision is grounded in the literature

Animated video for health literacy

Socratic dialogue for belief change

Structured intake for the Interaction Gap

Why we help people develop discernment, not just deliver facts

Correction

Reasoning

Two bodies of evidence show that reasoning-based interventions improve people's ability to evaluate misinformation

Costello, Pennycook and Rand · Science · 2024

Simchon, Lewandowsky and van der Linden · Current Opinion in Psychology · 2026

Why the two stages must work together

Free animated modules for anyone trying to make sense of health information

Unsafe by Design

Unsafe by Design: Consumer Health AI Failure Modes, Their Solvability, and a Two-Stage Educational Response

The peer-reviewed case for VeriHealth's approach

Six research questions VeriHealth is built to answer

The numbers behind the crisis, and the intervention

AI performance: benchmarks vs. real use

ChatGPT Health triage errors

AI misinformation susceptibility by corpus type

Safety disclaimer erosion, 2022 to 2025

Structured AI dialogue: intervention efficacy

Citation fabrication in medical AI

Institutional consensus: AI chatbot misuse as patient safety hazard

Public confidence in health information: a crisis in real time

The disciplines required to solve this problem, in one organization

VeriHealth and the field

The reasoning gap is real. The evidence for the solution exists. The work is now.

For Funders

For Clinical Collaborators

For Research Collaborators

Ready to talk?

Getting health information has never been easier.
Trusting it has never been harder.