CMS Provider Utilization Data Challenge

This is a start of a series of posts documenting three strangers 5k miles apart attempting work for the first time together on the Health Data Consortium’s HealthDataPalooza Code-A-Palooza.

Last week (April 9, 2014) CMS continued showing unprecedented transparency by releasing Medicare payer information data for the 2012 year. This dataset, 9M records in total, includes physician names and addresses in addition to CPT code frequency over 10. For patient security, it does not include payment data for less frequent treatments. Chances are, if you are a physician who has provided care to a medicare recipient in 2012, your billing information is included.

This government data disclosure has raised plenty of eyebrows, particularly from physicians.

“We believe that the broad data dump (Wednesday) by CMS has significant shortcomings regarding the accuracy and value of the medical services rendered by physicians,” said Ardis Dee Hoven, MD, president of the AMA, in a statement. “Releasing the data without context will likely lead to inaccuracies, misinterpretations, false conclusions and other unintended consequences.”

Still others in industry question what value “normal” people would find in such a data dump, without the skillsets necessary to process the information. Personally, this seems akin to doctors who don’t trust that patients can understand disease management or critically evaluate treatment options or diagnostic information.

As our team dug into the CMS data set for the first real time today, the shortcomings became abundantly clear. The yearly aggregated data came with significant limitations. There’s no real granular census data to overlay with the medicare population. There’s no weekly or monthly trending data to work with health patterns or CDC data.

As for the competition, the rules were skewed heavily towards pretty data visualizations with no regard to the underlying statistical analysis, and with an emphasis on consumer needs as if they operate in their own bubble outside of the greater healthcare ecosystem.

So, with this in mind, I solicited two others to join me in the challenge. Mandi Bishop has years of deep health IT domain and analytical experience, as well as creativity and an intoxicating personality that can pull even worse case scenarios out of the gutter. On the data science side, Nick Kypreos is all numbers. He did his PhD work at CERN and was previously a data scientist with both Amazon and Microsoft. As for myself, I’m tying everything together. I’m a jack of all trades when it comes to health IT, statistics, platform development and design. Together, we are TeamFloriduh. More on the name, and approach later.

In addition to addressing the core data challenge, we hope to engage the broader community in health tech innovation. Today’s healthcare situation is shrouded in negativity, and by participating in this competition, we hope to expose others outside of the health data community of the interesting and innovative project currently being supported by both the private and public sectors.

And to top it off, we want to show others that yes, it is possible, for “normal” people to learn a little statistical programming so that they too may make use of future public data sets. We’re doing this by curating a list of reliable, quality resources to address the specific skillsets needed to process the available information.


Leave a Comment
Note: you must be signed into your GitHub account to leave a comment. Why?