Open Health Data Hub



Data Validation Report

Every query result on this site is validated against published statistics from the CDC and NCHS. This report shows our automated test suite results.

38/38
Checks Passed
19
Test Cases
2
Datasets
Mar 6, 2026
Last Run

Methodology

Each test case compares a result from our data against a published value from an official CDC or NCHS source. We run two independent layers of validation for every test:

Layer 1
Gold SQL

A hand-written SQL query is executed directly against the DuckDB database on Railway. This tests whether the data itself reproduces published statistics, independent of the AI layer. If Layer 1 fails, the data or our understanding of the codebook is wrong.

Layer 2
NL Query

A natural language question is sent through the full production pipeline: the question goes to our API, Claude generates SQL, Railway executes it, and the result is checked. This tests the end-to-end system that users interact with. If Layer 2 fails but Layer 1 passes, the AI is misinterpreting the question or generating incorrect SQL.

BRFSS Results

11 tests

Behavioral Risk Factor Surveillance System — self-reported survey data, 400K+ respondents/year. Values are weighted prevalence percentages using CDC's _LLCPWT survey weights.

StatisticYearPublishedGold SQLDevNL QueryDevSource
Adult obesity (national)201730.1%30.1%0.030.1%0.0CDC Obesity Maps
Adult obesity (national)201830.9%30.9%0.030.9%0.0CDC Obesity Maps
Adult obesity (West Virginia)201839.5%39.5%0.039.5%0.0CDC State Data
Adult obesity (Colorado)201822.9%22.9%0.022.9%0.0CDC State Data
Current smoking201815.5%15.5%0.015.5%0.0CDC Tobacco Data
Adult obesity (national)202031.9%31.9%0.031.9%0.0CDC Obesity Maps
Diagnosed diabetes201810.9%11.4%+0.511.8%+0.9CDC Diabetes
Current asthma20189.2%9.2%0.09.2%0.0CDC Asthma
Physical inactivity201824.5%24.5%0.024.5%0.0CDC PCD
Adult obesity (national)202334.3%32.8%-1.532.8%-1.5CDC Newsroom
Depressive disorder201919.9%18.8%-1.118.8%-1.1PLOS ONE

NHANES Results

8 tests

National Health and Nutrition Examination Survey (2021–2023 cycle) — clinical exams + lab measurements. Values are weighted prevalence percentages using WTMEC2YR exam weights.

StatisticYearPublishedGold SQLDevNL QueryDevSource
Obesity overall (BMI≥30)2021–2340.3%40.3%0.039.8%-0.5NCHS Brief #508
Obesity, men (BMI≥30)2021–2339.2%39.2%0.038.7%-0.5NCHS Brief #508
Obesity, women (BMI≥30)2021–2341.3%41.3%0.040.8%-0.5NCHS Brief #508
Total diabetes (incl. undiagnosed)2021–2315.8%13.8%-2.013.8%-2.0NCHS Brief #516
High cholesterol (≥240 mg/dL)2021–2311.3%11.4%+0.111.1%-0.2NCHS Brief #515
Hypertension (measured + Dx)2021–2347.7%50.0%+2.350.0%+2.3NCHS Brief #511
Severe obesity (BMI≥40)2021–239.4%9.4%0.09.3%-0.1NCHS Brief #508
Depression (PHQ-9≥10)2021–2313.1%12.6%-0.512.6%-0.5NCHS Brief #527

Notes

Tolerance thresholds

Each test has a pre-defined tolerance (typically 1–2 percentage points for BRFSS, 1.5–5 for NHANES). These account for differences in survey weight versions, age cutoffs, and rounding. A deviation within tolerance is a pass.

BRFSS vs NHANES obesity gap

BRFSS reports ~31–33% obesity; NHANES reports ~40%. This is not an error. BRFSS uses self-reported height/weight (people underreport weight), while NHANES uses clinical measurements. The gap is well-documented in epidemiological literature.

What each layer catches

Layer 1 failures indicate data issues: wrong codebook interpretation, missing survey weights, incorrect variable coding. Layer 2 failures (with Layer 1 passing) indicate AI issues: the NL-to-SQL model is generating incorrect queries. Both layers passing means the data is correct and the AI can reproduce results from plain English questions.