The US Health and Human Services DOGE team (I guess DOGE still exists in some form) just released a new aggregated, provider-level Medicaid claims database covering January 2018 through December 2024. With this dataset, you can track the monthly claims for each procedure (by HCPCS Code) and provider over time.

Even if the framing around this dataset’s release is partisan—tied to allegations of Medicaid fraud in Minnesota—it is a genuine advance in transparency for the US’s third largest spending program. No doubt this accomplishment required a lot of work on the backend to harmonize countless fragmented datasets into one tidy schema. These data were difficult to access before, and now they are free for anyone to use. Journalists, policy researchers, and companies working in the US healthcare sector will benefit the most, but every taxpayer benefits from added transparency about where their tax dollars go.

I would say there is the potential for these data to be misused to spark witch hunts, but this is more or less the stated purpose for this data release. Per Elon Musk: “Medicaid data has been open sourced, so the level of fraud is easy to identify.” If you go on Twitter, you will find several people have already plugged in the dataset to Claude Code and trumpeted their ASCII tables of providers flagged for potential fraud. Inevitably, some of these providers targeted by public scrutiny for their unusual billing patterns will have perfectly innocent explanations. But if ProPublica is excited about the release of this new dataset, then so am I.