VeriHealth — True Health Discernment
Medical misinformation has always been dangerous. Social media gave it reach. AI makes it sound like the truth.
VeriHealth helps people know what to question, and what to trust.
Our Mission
Most people were never taught how to evaluate health information well: how to follow the medical logic, weigh the evidence, and judge whether a source can be trusted. This is the health reasoning gap. It is not a personal failure but a structural one, because this was never part of general education, and it still isn't.
Closing that gap requires discernment: the ability to distinguish reliable from misleading health information. VeriHealth exists to help people develop this discernment, for any source, any topic, any claim.
A platform that develops the capacity to ask the right health question and evaluate what comes back, from any source. Two components, designed to work together, and each deployable independently.
Short-form video that develops health reasoning skills across languages and literacy levels: recognizing misleading framing, formulating precise questions, interpreting probabilistic language.
An AI-powered clinical interpreter that clarifies what the user is asking, surfaces false premises, and renders AI output in language the user can understand and act on safely.
Every design decision is grounded in the peer-reviewed literature: animated video for health literacy populations, Socratic dialogue for durable belief change, and structured intake to surface the unknown unknowns that free-text interfaces leave unasked.
Who We Are
VeriHealth brings together clinical medicine, behavioral science, health communications, and institutional leadership, the combination needed to develop true health discernment at scale.
The Problem
As a result, people can get answers to all their questions, but cannot judge which ones to trust.
The mediation that was lost
Before the internet, the healthcare system mediated access to health information. A clinician translated medical knowledge into guidance a person could act on, interpreting in both directions: turning a patient's concern into the right question, and turning the evidence back into something usable. The internet removed that mediation. It left people to evaluate complex, often conflicting health information on their own, with no training in how to do it. This is the health reasoning gap, and it has never been closed.
Social media widened it
The platforms built to connect people reward engagement, not accuracy. Misleading health content thrives there because it confirms fears, validates instincts, and travels faster than any correction. And the most damaging content is often not false at all.
"Misleading claims from credible sources can be more damaging than blatant falsehoods."
Van der Linden and Kyrychenko, Science, 2024This is the first thing any defense must reckon with. The danger is not only outright falsehood, which fact-checking can sometimes catch, but technically accurate content framed to mislead, which it cannot. You cannot correct what is not, strictly speaking, false.
AI arrived as the solution. It made the problem worse.
AI promised to close the gap. Instead it opened a new kind of failure. The knowledge is in the model; the 2026 Oxford trial showed that people interacting with that same model still got the wrong answer about two times in three. The information was not false, and no one framed it to mislead. The failure was in the interaction itself.
That failure operates on two layers. At the model layer, AI tools strip away their own safeguards, absorb misinformation from the data they are trained on, and express confidence they have not earned. At the interface layer, the design of the exchange degrades a person's ability to evaluate what they receive: it accepts false premises, answers questions the person did not know how to ask well, and delivers all of it in fluent, authoritative prose. The model holds the knowledge. The interaction prevents the person from reaching and evaluating it.
Model-layer failures
Interface-layer failures
These failures point in a single direction. If the model already holds the knowledge, the answer is not a better model. It is a better interaction: one that helps a person ask the right question, surfaces the assumptions hidden inside it, and returns an answer calibrated to how certain the evidence actually is. And because the same agreeable, frictionless design that earns a person's trust is what erodes their judgment, that interaction must build independent capacity rather than dependence.
Where this has left us
Put the failures together and a spectrum comes into view. Health information can be outright false. It can be technically accurate yet framed to mislead. And now it can be accurate in the model yet rendered unusable by the interaction that delivers it. No single content-based defense reaches across all three, because what they share is not a property of the content. It is the gap between the information and a person's capacity to evaluate it. That gap does not care which channel the claim arrives through, whether the person went looking for it, or even whether they use AI at all.
The solvable paradox
The failures define the solution. Any approach that actually closes the reasoning gap has to do four things the current tools do not. It must build the capacity to evaluate, not correct one claim at a time. It must interrogate reasoning and framing, not only what is verifiably false. It must restructure the interaction itself, clarifying the question and returning honest uncertainty, not deliver confident answers. And it must build independence, not dependence. A capacity defined this way is general: it applies to any source, any claim, and any person, not only to those seeking a diagnosis and not only to those who already use AI.
This is not theoretical. The same conversational technology that deepens the gap can, designed around these principles, durably shift what people believe and do. The evidence is peer-reviewed, and the effect sizes are large.
"The design of the interaction, not the model's capabilities, determines whether AI deepens the gap or develops the discernment to close it."
Unsafe by Design, VeriHealth NFP, 2026The VideoBot
VeriHealth helps people develop the discernment to evaluate any health claim, from any source, and helps them apply it in real time, at the moment it is needed. Two components, designed to work together as complementary stages of a single intervention, each deployable independently.
Short-form animated video (30 to 120 seconds) that builds the foundation of health discernment: the reasoning skills and the core knowledge needed to evaluate health claims. Animation is the right medium: a 2024 systematic review found that animated video generally improved health information recall, with the strongest effects for low health literacy populations. Eye-tracking studies show that stylized animated instructors direct attention toward content rather than the presenter, reducing extraneous cognitive load. Modules are available in English and Spanish, designed to work across literacy levels.
Current modules develop the capacity to
The SSI addresses the Interaction Gap identified by Bean et al. (2026), the gap between what AI knows and what users can actually get from it: the chatbot has correct knowledge, but the free-text interface prevents users from reaching it. Users ask imprecise questions, embed false assumptions, and receive confident answers they cannot evaluate. The SSI restructures that interface.
On intake, it clarifies what the user is asking, surfaces embedded assumptions, and formulates a precise clinical query. On output, it renders the response in language the user can act on, replacing false certainty with plain-language estimates of how confident the answer really warrants being.
Core functions
The SSI is designed to work with any medical AI system, connecting to it via API, with or without VeriHealth’s video modules.
Evidence Base
Hansen et al. (JMIR, 2024) systematic review of RCTs; Li, Wang and Mayer (British Journal of Educational Psychology, 2024) eye-tracking study; Meppelink et al. (JMIR, 2015) on attitude change in low-literacy populations.
Costello, Pennycook and Rand (Science, 2024) on durable conspiracy belief reduction; Hou et al. (Nature Medicine, 2025) cluster-RCT on vaccine behavior; Simchon et al. (Current Opinion in Psychology, 2026) meta-analysis of 33 inoculation experiments.
Bean et al. (Nature Medicine, 2026) Oxford interactive trial demonstrating that free-text chat causes users to perform worse than internet search; Sambara et al. (arXiv, 2026) on false premise accommodation in medical AI chatbots.
Why This Approach
The evidence behind the design.
The Opening Argument
Think about the last time someone corrected a piece of health misinformation you had believed. You updated that belief. Then you encountered the next false claim. You had no more tools to evaluate it than you did before. The correction gave you an answer. It did not give you the ability to evaluate the next claim on your own.
That is the central failure of the dominant approach to health misinformation. It corrects answers. It does not develop discernment. That is why VeriHealth is designed differently.
When a false health claim spreads, the standard response is to correct it: flag the article, issue an accurate statement, counter with facts. The approach is intuitive. It also has three structural limits.
It is reactive: it addresses claims after they have spread. It does not scale: there are far more false claims than corrective resources. And it builds no capacity to evaluate the next false claim. It is treatment, not inoculation.
Help people develop the discernment to evaluate any claim, from any source. A single well-designed intervention can produce durable, transferable resistance, persisting for months and extending to claims it never directly addressed.
The goal is not to tell people what is true. It is to help them develop the discernment to evaluate any claim for themselves.
What the Evidence Shows
In a study of 2,190 participants, brief, personalized AI conversations produced a durable reduction of roughly 20 percent in false beliefs, persisting for at least two months. The critical feature was engaging each person's specific reasoning rather than delivering a generic correction, the same person-specific principle the SSI applies through Socratic questioning.
A signal-detection meta-analysis of 33 inoculation experiments (37,025 participants) found that technique-based prebunking, the same approach VeriHealth's modules use, improves people's ability to distinguish reliable from misleading content, without making them indiscriminately skeptical. The gain is in discrimination, not blanket distrust: exactly the discernment a reasoning-based intervention is meant to build.
Why Both Stages Are Necessary
The animated modules and the SSI address different cognitive tasks, and both are necessary.
The modules build the declarative foundation of health discernment: what a confidence interval means, what hallucination looks like in practice, how to recognize misleading framing. The SSI develops the applied capacity: testing that foundation against the user's own specific question, surfacing the assumption embedded in it, and adapting in real time to where the reasoning needs to go.
These are cognitively distinct processes that require different pedagogical modes. A structured lesson can efficiently establish conceptual frameworks. It cannot engage individual misconceptions or surface specific reasoning errors in real time. That is what Socratic dialogue does. The two are designed to work together as complementary stages of a single intervention: the modules build the foundation, and the dialogue puts it to work on the person's own question. Using both is how the design aims to produce discernment that lasts.
Video Library
VeriHealth modules develop health reasoning skills, then apply them to the specific health decisions people face. Available in English and Spanish, produced with a human cultural adaptation review.
Introduction
Reasoning Skills
Modules that build the skill of evaluating any health claim, from any source.
Health Topics
Modules that apply those skills to specific health decisions.
White Paper
Consumer Health AI Failure Modes, Their Solvability, and a Two-Stage Educational Response
About This Paper
This paper documents a health information crisis with a specific mechanism: a reasoning gap between what the information environment demands of users and what they have been equipped to provide. The Interaction Gap is not a technology failure. It is the predictable consequence of delivering answers to people who needed scaffolding for reasoning instead. The paper categorizes AI health tool failures across two distinct layers and presents the evidence that the mechanism producing durable health behavior change is not information delivery but structured reasoning dialogue that changes how people evaluate health information, in a transferable way and at the moment that reasoning is needed.
Abstract
The core finding from the 2026 peer-reviewed literature is that AI chatbots score 95% on standardized clinical scenarios yet give study participants the wrong answer roughly two times in three. This paper explains why that gap exists, and what it will take to close it.
Large Language Models (LLMs) have become a primary source of health information for hundreds of millions of people worldwide, yet the evidence base for their clinical safety reveals a set of failures that constitute a genuine public health crisis. This paper synthesizes findings from a landmark randomized trial published in Nature Medicine in February 2026 demonstrating that study participants using AI health tools perform worse than users of standard web search, alongside a broader evidence base comprising more than 40 peer-reviewed studies from 2023 to 2026, supplemented by preprint evidence and foundational literature in educational psychology and multimedia learning.
AI has become an active structural force in how health information is generated, consumed, trusted, and misunderstood. Its failures operate across two layers: at the model layer, where the technology generates plausible but fabricated content and absorbs misinformation from the environment it inhabits; and at the interface layer, where product design actively degrades users’ capacity to evaluate what they receive, systematically accommodates dangerous false premises, and amplifies existing health misconceptions at scale. The same commercial pressures that created these failures have also systematically eroded the safeguards that once warned users of the technology’s limitations.
Critically, this paper argues that the crisis has a specific mechanism: a reasoning gap between what the health information environment demands of its users and what those users have been equipped to provide. People are not failing to find health information. They are evaluating it without the cognitive tools the task requires, and that evaluation consistently leads them to wrong conclusions. The proliferation of AI-powered health tools has widened this gap rather than closing it, delivering diagnostic conclusions to users who lack the reasoning framework to evaluate them reliably. The Bean et al. finding that study participants using current AI health tools perform worse than users of standard web search is a direct measurement of this dynamic.
We further present evidence that the same technology, redesigned around structured dialogue rather than conversational answer-delivery, can produce durable reductions in health misinformation and measurable improvements in health behavior. Drawing on the multimedia learning and prebunking literatures, we describe a two-stage educational architecture (structured animated video instruction followed by personalized Socratic dialogue) that operationalizes this evidence into a coherent public health tool. We outline a research program to evaluate this architecture in clinical contexts and propose a broader research agenda for the field of consumer health AI.
Table of Contents
Evidence Base
A curated registry of peer-reviewed literature informing VeriHealth's design. Every source has been independently verified against the primary publication.
Every design decision VeriHealth has made (the two-stage architecture, the animated format, the Socratic dialogue structure) is traceable to a specific finding in the peer-reviewed literature. Each choice reflects the evidence on what develops genuine health discernment rather than what merely delivers information. This table is that record.
| Authors and Year | Journal | Finding | Category |
|---|---|---|---|
| Gong et al., 2025 J Med Internet Res 27:e84120 | 10.2196/84120 | Systematic review of 39 medical LLM benchmarks. Knowledge-based benchmarks: 84%-90% accuracy. Practice-based benchmarks: 45%-69%. Safety assessment accuracy: 40%-50%. Examination scores are insufficient and misleading proxies for clinical readiness. | Interaction |
| Bean et al., 2026 Nature Medicine 32:609–615 | 10.1038/s41591-025-04074-y | RCT of 1,298 participants. LLMs score 94.9% in isolation; study participants achieve under 34.5% condition identification. Internet search outperforms all AI chatbot groups by 1.76 times. AI chatbot users had 36% lower odds of recognizing urgent red-flag symptoms than internet search users (inverse of reported OR 1.57). | Interaction |
| Ramaswamy et al., 2026 Nature Medicine | 10.1038/s41591-026-04297-7 | Stress test of ChatGPT Health: 60 clinician-authored vignettes, 960 total responses across 16 factorial conditions. Prior low-acuity framing increased under-triage probability by OR 11.7 (95% CI: 3.7 to 36.6). Over-triaged 64.8% of non-urgent cases. | Interaction |
| Sambara et al., 2026 arXiv (MedRedFlag) | arXiv:2601.09853 | Even when frontier models detect dangerous false assumptions, they accommodate them in 60 to 74% of cases. GPT-5 detects 88% of false premises but accommodates 73%. | Interaction |
| Goh et al., 2024 JAMA Network Open | 10.1001/jamanetworkopen.2024.40969 | Physicians randomized to GPT-4 showed no improvement over those using conventional resources (76% vs 74%, p=0.60). GPT-4 alone scored 16 points higher than either physician group. | Interaction |
| Omar et al., 2026 Lancet Digital Health 8:100949 | 10.1016/j.landig.2025.100949 | 3.4 million prompts across 20 LLMs. Overall susceptibility to fabricated health claims: 31.7%. Fabricated clinical notes: 46.1% susceptibility. | Hallucination |
| Linardon et al., 2025 JMIR Mental Health | 10.2196/80371 | 19.9% of GPT-4o citations entirely fabricated; 45.4% of real citations contained errors. Fabrication tracks topic familiarity: 6% for major depression but 28 to 29% for less-studied disorders. | Hallucination |
| Griot et al., 2025 Nature Communications | 10.1038/s41467-024-55628-6 | Across 12 models, nearly all scored 0% on unanswerable question identification. Best performer (GPT-4o) achieved only 3.7%. Confident wrongness is the default. | Hallucination |
| Sharma et al., 2025 npj Digital Medicine | 10.1038/s41746-025-01943-1 | Medical disclaimers in AI health outputs dropped from 26.3% in 2022 to 0.97% in 2025. Linear decline (R²=0.944), reduction of 8.1 percentage points per year. | Hallucination |
| Costello, Pennycook and Rand, 2024 Science 385:eadq1814 | 10.1126/science.adq1814 | Over 2,190 conspiracy believers. Brief personalized AI conversation produced a durable reduction of approximately 20% in conspiracy beliefs (relative reduction; 16.8 and 12.3 points across two studies), persisting two months. Effect larger than reflective thinking interventions, which yielded only one- to six-point reductions on comparable scales. | Intervention |
| Hou et al., 2025 Nature Medicine 31:1855–1862 | 10.1038/s41591-025-03618-6 | Cluster-RCT of 2,671 parents. Socratic AI chatbot increased vaccine receipt or appointments 3.85 times vs usual care. Improved vaccine literacy and health discernment. | Intervention |
| Simchon et al., 2026 Current Opinion in Psychology | 10.1016/j.copsyc.2025.102194 | Meta-analysis of 33 inoculation experiments, 37,025 participants. Inoculation improved discrimination without increasing response bias. Participants became more discerning, not uniformly skeptical. | Intervention |
| Hansen et al., 2024 JMIR 26:e58306 | 10.2196/58306 | Systematic review of RCTs. Animated video consistently improved health information recall, with strongest effects for individuals with low health literacy. | Intervention |
| Omar et al., 2025 International Journal for Equity in Health | 10.1186/s12939-025-02419-0 | Systematic review of 24 studies. 91.7% identified biases. Gender bias in 93.7% of studies. Racial or ethnic biases in 90.9%. Bias in medical LLMs is pervasive and systemic. | Equity |
| Chen et al., 2026 Nature Medicine | 10.1038/s41591-026-04229-5 | Review of 4,609 LLM clinical studies. 45.9% from the U.S., 7.6% from the U.K. LLM safety profiles in non-English languages remain largely uncharacterized. | Equity |
| Allen et al., 2024 Science | 10.1126/science.adk3451 | Vaccine-skeptical content from mainstream outlets reduced vaccination intentions 46 times more than flagged misinformation. A single headline reached more than 50 million people. | Landscape |
| Van der Linden and Kyrychenko, 2024 Science 384:959–960 | 10.1126/science.adp9117 | The dominant threat is technically true but misleadingly framed content. Demanding unattainable causal proof before acting on misinformation evidence serves inaction, not public health. | Landscape |
| Montero et al., 2026 KFF Tracking Poll | kff.org, March 25, 2026 | 32% of U.S. adults used AI chatbots for health advice in the past year. 92% report satisfaction. Usage rising fastest among younger adults and those who cannot access or afford a physician. | Landscape |
| ECRI, 2026 Top 10 Health Technology Hazards | ecri.org | Misuse of AI chatbots ranked #1 health technology hazard for 2026 by the independent nonpartisan patient safety organization ECRI. Chatbots produce authoritative-sounding responses that are not regulated as medical devices, not validated for clinical use, and programmed to satisfy users rather than provide accurate answers. | Report |
| Edelman Trust Institute, 2026 Trust Barometer: Trust and Health · N=16,009 · 16 markets | edelman.com, March 2026 | 51% of people globally are confident in their ability to find and evaluate health information, down 10 points in one year. Statistically significant declines in 14 of 16 markets, consistent across age, education, and political affiliation. 70% hold at least one widely debunked health belief at equal rates across education levels. People who hold more of them are most likely to consult AI for health guidance (61% monthly vs. 19%). | Report |
Research Agenda
VeriHealth is not only building a platform. It is defining a research program at the intersection of educational psychology, misinformation science, and consumer health AI design.
VeriHealth's platform is designed to be evaluated, not just deployed. The six questions below are open questions for the field. The answers will matter beyond VeriHealth.
Bean et al. (2026) demonstrated that in-silico benchmarks do not predict real-world safety. Any evaluation framework for consumer health AI must include interactive testing with diverse human populations before deployment, analogous to clinical trials for medication. The field needs standardized protocols for that testing. VeriHealth is building them.
MethodologySafety evaluationAI can both worsen and improve health reasoning depending on interaction design. The specific question VeriHealth is built to answer: can a purpose-built Interaction Layer, deployable across any underlying medical LLM, reproducibly close the gap Bean et al. documented? Costello et al. provide proof of concept for the mechanism. What is needed now is a systematic program to identify generalizable design principles across health domains.
Interaction designRCTVeriHealth's two-stage design has strong theoretical grounding but lacks direct empirical evaluation in medical misinformation contexts. Key open questions: Does video-first, followed by dialogue, produce better outcomes than either component alone? Does a health reasoning intervention produce transfer to claims not covered in the instructional content? Does population heterogeneity in AI trust levels predict differential response? These questions require randomized trials with interactive human user testing and diverse populations.
Architecture designTransfer effectsVan der Linden and Kyrychenko (2026) argue for extending psychological inoculation to the models themselves: training LLMs to reject misinformation the way humans can be inoculated against it. The question of whether the same inoculation principle can operate at both the human and the model level is an open frontier for the field.
Model safetyInoculationCurrent studies provide snapshots at single time points. No study has tracked the longitudinal effects of habitual use of AI health tools on a population's health literacy, reasoning capacity, or clinical outcomes. Given the Interaction Gap finding from the Oxford trial, and the emerging preprint evidence that habitual AI use erodes independent reasoning capacity, this is an urgent gap. Additionally, because models update continuously and silently, one-time testing is insufficient. An independent monitoring infrastructure, analogous to pharmacovigilance for medications, should track safety signals from deployed AI health tools on an ongoing basis.
LongitudinalSurveillanceA researchable problem in its own right: how do you communicate AI health risk to a public that experiences AI as helpful? What vocabulary works? What framing produces behavioral change rather than dismissal? What visual metaphors convey the Interaction Gap without inducing either panic or complacency? These questions are amenable to the experimental methodology used in the inoculation literature, and answering them is a prerequisite for any effective public health response. This is not a communications afterthought. It is a core scientific question.
Health communicationBehavioral scienceThe Crisis in Numbers
All figures are drawn from peer-reviewed or independently verified sources, each checked against the primary publication. Each number represents a finding with direct implications for how health AI is deployed and regulated.
Behind each figure below is a person making a health decision on information they could not evaluate. Together these numbers show what the crisis looks like when measured: it is real, the failure is structural, and the evidence for the intervention is strong. Every figure is from a peer-reviewed or independently verified source.
Our Team
VeriHealth was founded by people who speak the languages this problem requires: medical science, public health, health communication, and the languages of the communities most affected. The team exists to close the gap between what the evidence shows and what a parent can understand at midnight.
Leadership
Founder of Kilkenny Capital Management, where he raised and invested nearly $500 million in biotechnology assets over fifteen years. His synthesis of recent peer-reviewed evidence on consumer health AI failures is the evidentiary foundation of VeriHealth's platform design.
Former BBC Senior Executive, WHO Media Consultant, and CEO of Medical Aid Films, where she built a library of animated films that taught vital knowledge and skills on women's and children's health to disadvantaged communities worldwide. She brings to VeriHealth the operational infrastructure of health communication at scale.
A practicing anesthesiologist with graduate training in epidemiology and cancer prevention policy, and leadership roles at the American Medical Association and American Society of Anesthesiologists. She brings to VeriHealth fluency in clinical medicine, public health methodology, and the institutional language of health systems: three languages that rarely coexist in one person.
Bilingual behavioral scientist and Director of Primary Care Research at the American Academy of Pediatrics, where she oversees vaccine hesitancy research through a national pediatric practice network. Co-founder of the MRSA Research Center at the University of Chicago, she brings to VeriHealth both the research infrastructure and the lived understanding of what it costs when families cannot access accurate health information.
Senior Medical Advisor
Former President of the University of Chicago Medicine health system, Dean of the Pritzker School of Medicine, and Executive Vice President for Medical Affairs at the University of Chicago. A member of the National Academy of Medicine with more than 250 peer-reviewed publications, he connects VeriHealth to the academic medical center community.
Research Relationships
Michael Walsh conceived VeriHealth's Structured Socratic Interface in early 2025, before the Oxford Internet Institute's Bean et al. study was published. When that study appeared in Nature Medicine in February 2026, it independently confirmed the design principle VeriHealth had already built toward. That convergence is the basis of a developing research relationship with OII investigators Luc Rocher and Adam Mahdi.
VeriHealth's Socratic design approach shares its core premise with the TITAN project (5.7 million euros, 14 partners, 2022 to 2025), a European consortium that built Socratic coaching tools to help people reason about misleading information for themselves rather than telling them what to believe. TITAN is the closest operational precedent for the reasoning-first approach VeriHealth employs, and VeriHealth is pursuing a research relationship with the consortium. Both rest on the same premise: that the failure is cognitive, not informational.
Why This Team
Medical misinformation is not a knowledge problem. It is a structural one: the ability to evaluate health information well has never been part of general education, never been equally distributed, and nothing has replaced it. VeriHealth's founding team was assembled specifically around that gap. Every member is fluent in at least one of the languages the gap produces: medical, public health, health communication, and the languages of the communities most affected.
The founding insight was simple and serious: people who speak medical language and can hear misinformation as distortion have an obligation that those who cannot do so do not share. VeriHealth was founded by people who heard it, and decided that hearing it without acting on it was no longer acceptable.
News and Updates
Organizational updates from VeriHealth alongside key developments in the research and policy landscape we work within.
VeriHealth organizational announcements will appear here. Grant decisions, research milestones, clinical partnerships, and published work.
ECRI, the independent nonpartisan patient safety organization, ranked AI chatbot misuse first on its annual Top 10 Health Technology Hazards report. The finding: chatbots produce authoritative-sounding responses that are not regulated as medical devices, not validated for clinical use, and capable of providing false or misleading information with significant patient harm implications. The report notes that chatbots are programmed to satisfy the user rather than provide accurate answers.
ECRI · Patient SafetyThe annual Edelman Trust and Health survey of 16,000 people across 16 markets finds that only 51% of people globally are confident in their ability to find and evaluate health information, a decline of 10 points in a single year. The decline is statistically significant in 14 of 16 markets and is consistent across age groups, education levels, and political leanings. Separately, 70% of respondents hold at least one widely debunked health belief, and those who hold more of them are more likely to turn to AI for health guidance.
Edelman Trust Institute · Global SurveyBBC Inside Health features Oxford's Adam Mahdi explaining why AI scores 95% in isolation but users get the right answer only 35% of the time. England's Chief Medical Officer warns that chatbot answers are "both confident and wrong." Includes real patient accounts of AI advice gone right, and dangerously wrong.
BBCNPR covers both the Bean and Ramaswamy findings. Bean identifies the core problem as a two-way communication breakdown: users don't know what information AI needs, and the responses combine good and poor recommendations in ways that are difficult to distinguish.
NPRThe New York Times Morning Briefing covers the published Bean et al. finding and its implications for the millions of Americans now turning to AI for health advice.
The New York TimesThe paper demonstrating that structured AI dialogue reduces conspiracy beliefs by about 20%, the intervention evidence at the core of VeriHealth's design, wins the oldest award given by the American Association for the Advancement of Science. The prize last went to a social science paper in 1981.
Cornell University · AAASA deeply reported Times feature on Americans using AI chatbots to compensate for a health system that leaves them without answers, and the risks that misplaced trust creates. References Oxford research and Harvard Medical School findings on AI sycophancy and false premise accommodation.
The New York TimesThe Economist covers Costello et al. and the emerging science of inoculation and critical thinking education as tools against misinformation: the two mechanisms at the core of VeriHealth's platform design.
The EconomistA Perspective published in Science alongside Costello et al. Bago and Bonnefon assess the findings and conclude that a scalable intervention to recalibrate misinformed beliefs may be within reach, while raising the question of whether people will voluntarily engage with an AI designed to challenge what they believe.
ScienceGet Involved
VeriHealth is at the stage where the right partnerships, with funders, clinical institutions, and research collaborators, determine whether an evidence-based intervention reaches the populations that need it.
We are seeking philanthropic and institutional funding to move a platform with peer-reviewed evidence and a clinical deployment pathway from design toward validation and scale.
Where we are. VeriHealth has built the evidence base, the platform design, and an early prototype, and assembled the team across the four disciplines the problem requires. What is next. Completing the Structured Socratic Interface, developing the clinical evaluation plan, and preparing the first pilot of the integrated intervention. What funding unlocks. The step from a designed, evidence-grounded intervention to a piloted one: from knowing the mechanism works to proving this implementation of it works.
Build and validate
Reach
We are seeking clinical partners at major academic medical centers and community health institutions serving populations with limited access to trusted clinical relationships, across languages and literacy levels. VeriHealth is based in Chicago and is actively developing relationships with Chicago-area health systems.
The research questions VeriHealth is built to answer are open questions for the field. We are seeking partners with expertise in health communication, misinformation science, and AI interaction design to help answer them. The answers will matter beyond VeriHealth.
Chicago-based, nationally oriented. VeriHealth NFP is headquartered in Chicago, Illinois, with developing relationships across the Chicago academic medical center ecosystem. The platform is designed for national distribution through community health worker networks and trusted medical messengers at the point of care, free in multiple languages at launch.
Whether you are a funder, a clinical partner, or a research collaborator, we welcome the conversation.