Data Visualization Tools in Healthcare

Data analytics results are useful when the revealed information is presented in an understandable fashion. There are many tools currently available in the market, which are used to present information in the final stage of data analytics.

Advertisements

Purpose and Impact of data visualization in the Healthcare industry

There are many applications of data analytics in the healthcare industry: physician and ambulatory care centers, hospitals and health systems, managed care plans and HMOs, genomic studies, and Accountable care organizations (Cyranoski, 2015; eInfochips, n.d.). Therefore, visualizing health data could help tell stories through analyzing relevant data such that data-driven decisions and actions could be made (California HealthCare Foundation [CHCF], 2014; eInfochips, n.d.).  Cleardata (n.d.) suggested that presentation of data for data visualization should consist of the following best practices: the use of relevant data, begin with understanding what should be communicated then design towards that, make visualizations easy for the consumer, ensure HIPAA-compliance when showing data, and create visualizations that can lead in making data-driven decisions and action. Therefore, before selecting the right visualization tool, a presentation approach must be considered, which takes into account: personal level of expertise, visualization methods, and interactivity of the visualization (CHCF, 2014).

It is not enough to analyze the relevant data for data-driven decisions but also selecting relevant visualizations of that data to enable those data-driven decision (eInfochips, n.d.). There are many types of ways to visualize the data to highlight key facts through style and succinctly: tables and rankings, bar charts, line graphs, pie charts, stacked bar charts, tree maps, choropleth maps, cartograms, pinpoint maps, or proportional symbol maps (CHCF, 2014).  The above visualization plots, charts, maps and graphs could be part of an animated, static, and Interactive Visualizations and would it be a standalone image, dashboards, scorecards, or infographics (CHCF, 2014; eInfochips, n.d.).

The CHCF (2014) recommended that data visualization tools that everyone could use would be: Google Charts & Maps, Tableau Public, Mapbox, Infogram, Many Eyes, iCharts, and Datawrapper. CHCF (2014) also recommended some data visualization tools for developers such as High Charts, TileMill, D3.js, FLOT, Fusion Charts, OpenLayers, and JSMap.  Whereas eInfochips (n.d.) suggested visualization tools like Tableau, R, and Spotfire. Many eyes have been shut down by IBM and have been replaced by Watson Analytics (Machlis, 2011).

Summary on three data visualization tools that are used in health care (Machlis, 2011):

Tool Name Description Advantages Disadvantages Skill Level Required Runs on
R A statistical analysis tool that can not only do simple arithmetic and regression analysis, but it can also do complex data preprocessing, data mining, machine learning, and static data visualizations. Library of code is supported by the community, which is subject matter experts. Runs as a command line program, therefore there is a need to install a Graphical User Interface. Linux, Mac OS X, Unix, & Windows Advance Beginner
Tableau or

Tableau public (Free version)

A tool mostly used for interactive visualization that can do all the visualizations mentioned in the post, through dragging and dropping variables. A drag-and-drop interface allows for quick work to do data analysis that would take the time to manually code. Data stored in Tableau Public is stored on the web for free for others to use, which may make data privacy hard to control.  Otherwise, the full software is over $1K for a single user. There is limited customization in its interface, but can be done through code. Windows and Mac OS X Beginner to Intermediate
Google Chart Tools A self-contained application for storing data on the cloud and visualizing it anywhere, through the use of JavaScript visualization libraries. Integration with other Google products like Google Spreadsheets and is heavily documented JavaScript library. Requires coding to make the visualizations, and you don’t have access to the JavaScript codes and have to rely on continuous Google support. Any device with a web browser. Advanced to Expert

 

 

Resources

Big Data Analytics: R

R has proven to be the most effective software tool for analyzing big data. This post will be quick discuss of my evaluation of this tool and its relevance in the big data arena, with respects to text mining.

R is a powerful statistical tool that can aid in data mining.  Thus, it has huge relevance in the big data arena.  Focusing on my project, I have found that R has a text mining package [tm()].

Patal and Donga (2015) and Fayyad, Piatetsky-Shapiro, & Smyth, (1996) say that the main techniques in Data Mining are: anomaly detection (outlier/change/deviation detection), association rule learning (relationships between the variables), clustering (grouping data that are similar to another), classification (taking a known structure to new data), regressions (find a function to describe the data), and summarization (visualizations, reports, dashboards). Whereas, According to Ghosh, Roy, & Bandyopadhyay (2012), the main types of Text Mining techniques are: text categorization (assign text/documents with pre-defined categories), text-clustering (group similar text/documents together), concept mining (discovering concept/logic based ideas), Information retrieval (finding the relevant documents per the query), and information extraction (id key phrases and relationships within the text). Meanwhile, Agrawal and Batra (2013) add: summarization (compressed representation of the input), assessing document similarity (similarities between different documents), document retrieval (id and grabbing the most relevant documents), to the list of text mining techniques.

We use the “library(tm)” to aid in transforming text, stem words, build a term-document matrix, etc. mostly for preprocessing the data (RStudio pubs, n.d.). Based on RStudio pubs (n.d.) some text preprocessing steps and code are as follows:

  • To remove punctuation:

docs <- tm_map(docs, removePunctuation)

  • To remove special characters:

for(j in seq(docs))      {        docs[[j]] <- gsub(“/”, ” “, docs[[j]])        docs[[j]] <- gsub(“@”, ” “, docs[[j]])        docs[[j]] <- gsub(“\\|”, ” “, docs[[j]])     }

  • To remove numbers:

docs <- tm_map(docs, removeNumbers)

  • Convert to lowercase:

docs <- tm_map(docs, tolower)

  • Removing “stopwords”/common words

docs <- tm_map(docs, removeWords, stopwords(“english”))

  • Removing particular words

docs <- tm_map(docs, removeWords, c(“department”, “email”))

  • Combining words that should stay together

for (j in seq(docs)){docs[[j]] <- gsub(“qualitative research”, “QDA”, docs[[j]])docs[[j]] <- gsub(“qualitative studies”, “QDA”, docs[[j]])docs[[j]] <- gsub(“qualitative analysis”, “QDA”, docs[[j]])docs[[j]] <- gsub(“research methods”, “research_methods”, docs[[j]])}

  • Removing coming word endings

library(SnowballC)   docs <- tm_map(docs, stemDocument)

Text mining algorithms could consist of but are not limited to (Zhao, 2013):

  • Summarization:
    • Word clouds use “library (wordcloud)”
    • Word frequencies
  • Regressions
    • Term correlations use “library (ggplot2) use functions findAssocs”
    • Plot word frequencies Term correlations use “library (ggplot2)”
  • Classification models:
    • Decision Tree “library (party)” or “library (rpart)”
  • Association models:
    • Apriori use “library (arules)”
  • Clustering models:
    • K-mean clustering use “library (fpc)”
    • K-medoids clustering use “library(fpc)”
    • Hierarchical clustering use “library(cluster)”
    • Density-based clustering use “library (fpc)”

As we can see, there are current libraries, functions, etc. to help with data preprocessing, data mining, and data visualization when it comes to text mining with R and RStudio.

Resources:

Big Data Analytics: Installing R

Instructions to download, install, and become familiar with R, which is a free, open-source big data analytics software tool discussed in this week’s post.

I didn’t have any problems with the installation thanks to a video produced by Dr. Webb (2014).  It is a bigger package than what I thought it would be, so it can take a few minutes to download, depending on your download speed and internet connection. Thus,

(1)    For proper installation of R, you need to have administrative access on your computer.

(2)    Watch this video, to get a step-by-step instructions and an online tutorial to installing R and its graphical Integrated Development Environment (IDE).

  1. Note: The application for R 32x and 64x can be found at http://cran.r-project.org/
  2. Note: The Rstudio free “Desktop” graphical IDE can be found at http://www.rstudio.com/

(3)    Once installed Use the manual for this application at this site: http://cran.r-project.org/doc/manuals/R-intro.html

Once, I installed the software and the graphical IDE, I continued to follow along with the video to use the prepopulated Cars data under the “datasets” Packages, and I got the same result as shown in the video.  I also would like to note that Dr. Webb (2014) also had checked the Packages: “datasets,” “graphics,” “grDevices,” “methods,” and “stats” in the video, which can be hard to see depending on your video streaming resolution.

Resources:

Webb, J. (2014). Installing and Using the “R” Programming Language and RStudio. Retrieved from https://www.youtube.com/watch?v=77PgrZSHvws&feature=youtu.be