Case study: Health data analysis

Article: Community Health Map: A geospatial and multivariate data visualization tool for public health datasets

Year: 2012

Authors: Awalin Sopan a, Angela Song-Ie Noh b, Sohit Karol c, Paul Rosenfeld d, Ginnah Lee d, Ben Shneiderman a

Research positions:

a Human–Computer Interaction Lab, Department of Computer Science, University of Maryland

b Department of Computer Science, University of Maryland

c Department of Kinesiology, University of Maryland

d Department of Electrical and Computer Engineering, University of Maryland


This study is trying to evaluate the usability of “Community Health Map” (a design study), which is a multivariate and geospatial data visualization tool, to understand health care quality, public health outcomes and access to healthcare. The Community Health Map web application’s target audience is for policymakers at all levels of the government, where multivariate data is dynamically queried, filtered, mapped, tabled, and charted.  This study also made simple descriptive data analytical results about health care quality, access, and public health.


What is driving this research, is to understand where are the financial wastes of Medicare dollars spent across multiple states and Hospital Referral Regions (HRR).  Understanding financial waste is done by filtering on demographic data provided by the Open Government Directive, and health performance data by the Department of HHS and other healthcare industry data sets. Thus, this prototyping/design study of the Community Health Map application incorporated 11 health care variables to understand health care: quality, accessibility and public health.

  • Quality variables: life expectancy, infant mortality, self-reported fair/poor overall health, average numbers of unhealthy days.
  • Access variables: percentage of uninsured, physicians per 100K
  • Public Health variables: smoking rate, lack of physical activity, obesity, nutrition, flu vaccinations (ages 65+)

Methods and techniques used:

This study used descriptive analytics to pilot the Community Health Map.  Each of the three categories is colored differently, green for quality, orange for access, and blue for public health.  Medium Income, Poverty Rate, Percent Over Age 65, and Percent with a bachelor degree is assessed for dynamic filters.

Finally, this study is analyzing the usability of the Community Health Map, by giving people a four-minute tutorial video and 20 minutes to perform five tasks that involve: dynamic filtering, mapping, tabling and charting.  Each participant was encouraged to think out loud


Data is stored in different geospatial ways; some are by counties, zip codes, HRR, etc. and they first had to be pre-processed to a centralized unit.  There had to be assumptions placed and interpolation of geospatial data to drive the results seen in this study.

Not explicitly stated in this study, is that people feel comfortable to completely think out loud and provide the researchers with untethered accesses to their thoughts.  There may be some thoughts that are kept to themselves.  This also assumes that explicitly stated thoughts are valuable and the implicitly stated thoughts are not valuable to this study.

Also not explicitly stated in this study, was how usability is defined by those applicants that can perform the task.  There was no mention that the usability was fit for design for people with disabilities.  In general people with disabilities may be able to provide universal design suggestions that can help improve the usability of the product.  But, given the lack of mention of this, may suggest that the participants didn’t have a known disability to the researchers.

There also seemed to be a convenience and snowballing sampling given that all the subjects were graduate students at the University of Maryland.  It is also assuming that the people that this software is intended to be used by have had some graduate studies background.  Thus, any usability results from this study are limited to such a group and cannot be generalizable to the greater population.


Descriptive data analytics results

  • Using the data, the number of unhealthy days in a one-month period showed that eastern Kentucky and western West Virginia had the highest rates, which surpassed the entire nation. Through subject matter experts from the University of Maryland, this area has a lot of coal mines, which could result in poorer health in that population.
  • Areas of poor health are highly correlated to have low life expectancy. When using dynamic filtering, areas with higher than national average median income and low poverty rates had higher life expectancy than the reverse situation. Thus, showcasing those with low financial access could have lower access to healthcare which could lead to having lower life expectancies.
  • Those with no health insurance and had a higher rate of smoking were seen to have lower life expectancies, which rang true for four counties in South Dakota: Shannon, Bennett, Jackson, and Todd. These counties except for Shannon County also had higher infant mortality rates. Finally, in these counties with higher uninsurance rates also had fewer physicians per 100K residents.

Design study results

  • Some minor interface usability suggestions from one participant.
  • The small demo session used helped them navigate the application effectively.
  • No major usability challenges.
  • No need for installation of the application given that it is all web-based.

Contributions to the field or topic:

Two contributions were made.

  • The tool has been designed and built to be intuitive for a targeted audience and to facilitate data-driven answers based on demographic and health data.
  • The tool can be expanded to other fields with multivariable on geospatial data.


Some of these conclusions seemed obvious, but being able to visualize the data to see how some of these relationships (though other variables could be attributing to these results), can showcase some of the tropes and talking points in modern day politics over healthcare and healthcare reform.  Some of these tropes are: areas that are poorer usually fall into more illnesses; higher uninsurance rates lead to lower life expectancies, and areas with higher uninsurance rates have fewer physicians to treat the same amount of people. This study is to inspire further development of the tool and to enrich it with more demographic, outcome, cost variables, and health data.

Opinion on the validity of the claims and significance of the research:

The results seem to highlight typical political talking points through visualization of the data.  The usability study has some issues given that they were using graduate students and no political figures, which is who this tool is supposed to be used by.  Thus, are the results generalizable no, but is the tool useful, it depends on.  It depends on who needs it, what they need it for, and how easy it is to enter in data from various sources.


  • Sopan, A., Noh, A. S. I., Karol, S., Rosenfeld, P., Lee, G., & Shneiderman, B. (2012). Community Health Map: A geospatial and multivariate data visualization tool for public health datasets. Government Information Quarterly29(2), 223-234