Design Proposal for Healthcare based on the Internet of Things

As a big data analyst, this post would be considering that a major corporation who owns a string of state-of-the-art hospitals in the four corners area of the United States (Arizona, Colorado, New Mexico, and Utah) would like to incorporate data analytics in its operations. The first official task is to provide a Hadoop solution to their business problem of analyzing various data sets. Some structured and some unstructured data have been saved. Thus, this is a design proposal using a Hadoop environment with examples for a design flow chart through XML.

Advertisements

Introduction

This design proposal is for the major corporation who owns a string of state-of-the-art hospitals in the four corners area of the United States (Arizona, Colorado, New Mexico, and Utah). The solution proposed in this proposal for a centralized Healthcare Information Management System (HIMS) to give key stakeholders access to information derived from data that may be hidden in silos and to bring forth vital information needed to provide data-driven decisions.  The goal of this proposal is to allow for collecting, processing, and analyzing data to deliver key insights quickly and accurately to provide better service to the patients.  This proposal should allow for these hospitals to be more agile, responsive, and competitive in the healthcare industry. Thus, this proposal will call for the use Hadoop to analyze data derived from the Internet of Things (IoT) and social media for these hospitals, which is a set of various large and streaming data sets from multiple devices/sensors dealing with sensitive patient data.  This design proposal will consist of a design flow chart that will use XML, include the use of Hadoop and recommend a suite of data visualization tools currently used in the industry to visualize HIMS data.

Requirements

The HIMS must allow for analysis and visualization of new, functional and experimental data, in the context of old existing data, information, and knowledge (Higdon et al., 2016). Dealing with both structured and unstructured data can present a real-world challenge for most hospitals. Structured data exists from the devices and sensors used to monitor patients, while unstructured data exists from clinical, nursing, and doctors notes and diagnosis, which is being saved in a centralized data warehouse. Other data sources exist in helping maintain finances, HR, facility management, etc. are out of scope in this proposal. Another source of unstructured data is on social media from those related to the patients and those sent out of from the hospital; both are also out of scope of this design

This proposed system should be able to integrate internal and external datasets and structured and unscripted data from different sources, such that a traditional relationship database is not adequate to handle the amount of data and different data types (Hendler, 2016). Traditional databases rely on tables and arrays, but the data that come from the healthcare industry comes in N-dimensional arrays as well as from multiple difference sources which can vary across time and contain text data (Gary et al., 2005). Hence, this means that traditional databases are not the best solution for the HIMS.

Since Hadoop is a Platform as a Service (PaaS cloud solution), it can manage administers the runtime, middleware, O/S, virtualization, servers, storage, and networking, while the IT department managing the HIMS deals with the applications, data, and visualization of the results (Lau, 2001).  This provides another advantage over the use of traditional relational database systems for HIMS, which is dependent on the hardware infrastructure (Minelli, Chambers, & Dhiraj, 2013).  The benefits of using a cloud-based solution is a pay-as-you-go business model, which allows for the healthcare industry in these four states to pay only for what they need at that time, allowing for scaling up and down of computational and financial resources (Dikaiakos, Katsaros, Mehra, Pallis, & Vakali, 2009; Lau, 2011). Finally, another benefit of using a cloud-based solution like Hadoop is the reduction in IT Infrastructure costs, and there will be no need to absorb the cost of upgrading the IT Infrastructure every 2 to 3 years (Lau, 2011).

Therefore, to deal with the amount and complexity of the types of data a PaaS solution using Hadoop is recommended over relational databases.  PaaS tools like Hadoop uses distributed databases, which allows for parallel processing of huge amounts of data, for parallel searching, metadata management, analysis, use of tools like MapReduce, and workflow system analysis (Gary et al., 2005, Hortonworks, 2013, IBM, n.d.; Minelli et al., 2013).  MapReduce uses Hadoop’s Distributed File System (HDFS) to map the data and reduce the data using parallel processing to discover hidden insights in the healthcare data (Gary et al., 2005, Hortonworks, 2013, IBM, n.d.).  The HDFS stores the data into smaller blocks of data, which can be recombined when needed into one system, like Lego Blocks, and provides a means for throughput to the data (Hortonworks, 2013; IBM, n.d.).

Finally, for data visualization the California HealthCare Foundation ([CHCF], 2014) recommended that data visualization tools that everyone could use would be: Google Charts & Maps, Tableau Public, Mapbox, Infogram, Many Eyes, iCharts, and Datawrapper. CHCF (2014) also recommended some data visualization tools for developers such as High Charts, TileMill, D3.js, FLOT, Fusion Charts, OpenLayers, and JSMap.  Either of these solutions is fine, and choosing between them should be left to the IT professionals and key stakeholders.  All of this technological backbone to HIMS could become complicated, and for patients, healthcare providers, healthcare clearinghouses, and health plans staff this knowledge is not needed to get their job done and meet their data needs.  It is the job of the IT department to make this happen and to educate people about the use of HIMS to meet their needs.

Education amongst the healthcare providers, healthcare clearinghouses, and health plans on the role of a centralized HIMS for conducting data analysis to improve patient care is necessary (Higdon et al., 2016). These groups of people will be interfacing with the system, Hadoop environment, data, etc.  Therefore, outreach and education program must be implemented to ensure buy-in by all the key stakeholders, as well as training on how to use the system to meet their current needs.  When interacting with the HIMS a graphical user interface (GUI) should be used for patients, healthcare providers, healthcare clearinghouses and health plans, to ensure ease of use:

  • patients should only have read access to their data only
  • healthcare providers should have read/write/edit access to the data of the patient they are caring for
  • healthcare clearinghouses and health plans should only have read access to patient medication and services performed, but no access to anything else and have read/write/edit access to data that fits their scope of work
  • IT department should have full privileges to read/write/edit data that is deidentified patient data to ensure data quality, data input/output, and for data mining

GUIs are software that is built on top of the Hadoop environment and on the computational front-end hardware used to access this data.  The GUI could have forms for providers to Extract, Load, and Transform data easily independent of the operating system (Linux, Mac OS, Windows, Solaris, etc.) and device (Laptops, Tablets, Smartphones, etc.).

Data flow diagrams

Using GUI systems and forms, data that is written/edited can easily be written into an XML format, which should be expandable, consistent across all four states, and meets standards for all four states and federal government (Font, 2010):

<?XML version = “1.0”?>

<!—File name: HIMSmanualDataEntry.xml-->

<State>

            <Statename> Arizona </Statename>

            <HospitalName>…</ HospitalName>

            <DepartmentName>…</DepartmentName>

            …

</State>

<Patient>

            <PatientFirstName>…</PatientFirstName>

            <PatientMiddleName>…</PatientMiddleName>

            <PatientLastName>…</PatientLastName>

            <PatientID>…</PatientID>

            <PatientDOB>…</PatientDOB>

            <PatientStreetAddress>…</PatientStreetAddress>

            <PatientCity>…</PatientCity>

            <PatientState>…</PatientState>

            <PatientZipCode>…</PatientZipCode>

            <PatientHeight>…</PatientHeight>

            <PatientWeight>…</PatientWeight>

            <PatientOnMedications>…</PatientOnMedications>

            <PatientPrimaryCarePhysician>…</PatientPrimaryCarePhysician>

            …

            <PatientSatisfactionSurveyResultsQ1>…</PatientSatisfactionSurveyResultsQ1>

…

<PatientResponseOnSocialMediaPage>…</PatientResponseOnSocialMediaPage>

…

</Patient>

Data from sensors could also be entered into an XML format:

<?xml version = “1.0”?>

<!—File name: HIMSmanualDataEntry.xml–>

<Sensor>

<SensorName>…</SensorName>

<SensorType>…</SensorType>

<SensorManufaturer>…</SensorManufaturer>

<SensorMarginOfError>…</SensorMarginOfError>

<SensorMinValue>…</SensorMinValue>

<SensorMaxValue>…</SensorMaxValue>

</Sensor>

<State>

<Statename> Arizona </Statename>

<HospitalName>…</ HospitalName>

<DepartmentName>…</DepartmentName>

</State>

<Patient>

<PatientID>…</PatientID>

</Patient>

<SensorReadings>

<TimeStamp>…</TimeStamp>

<HeathIndicator1Value>…</ HeathIndicator1Value >

<HeathIndicator2Value >…</ HeathIndicator1Value >

</ SensorReadings >

 

All of this patient data comes in through these different data sources via a GUI or directly in XML format and entered into each state’s respective data centers, where each state inputs that data into HIMS data center, which gets processed and the results are displayed in the GUI to the end-user (Figure 1). Parallel processing occurs when the data is split, mapped and reduced through N nodes, which allows returning data-driven results at a much faster pace than having one node reduce the data in the HIMS datacenter.

1.png

Figure 1: Data flow diagram from the data source to data processing to results.

Overall system diagram

The entire HIMS is built on HDFS and Hadoop to leverage parallel processing and data analytics programs like MapReduce (Figure 2). Hadoop’s Mahout library would allow for Hadoop to use classification, filtering, k-means, Dirichlet, parallel pattern, and Bayesian classification similar to Hadoop’s MapReduce (Wayner, 2013; Lublinksy, Smith, & Yakubovich, 2013). Essentially a data analytics library. Hadoop’s Avro Library should help process the XML data in HIMS (Wayner, 2013; Lublinksy et al., 2013).   While Hadoop’s YARN can set up job scheduling and cluster management (Lublisky et al., 2013).  Finally, Hadoop’s Oozie library manages the workflow of a job by allowing the user to break the job into simple steps in a flowchart fashion. Essentially a workflow manager (Wayner, 2013; Lublinksy et al., 2013).  Though the use of Hadoop’s MapReduce function, petabytes or exabytes of data will be split into megabyte or gigabyte size files (depending on how many nodes and the input data) to be processed and analyzed. The analyzed results will provide useful data or insights hidden in the large data to the end user on their hardware devices via a GUI frontend.

2

Figure 2: Overall systems diagram for HIMS, which is based on Hadoop solution.

 Communication flow chart

Each of the hospitals will have patients, healthcare providers, healthcare clearinghouses and health plans professionals and each of them talk to one another.  Data derived from these conversations get placed inside each hospital’s data center, which gets translated to each states’ datacenter and finally placed into the HIMS data center (Figure 3).  Also, in figure three shows the hidden communications of Regulations, Policies, and Governance set forth between the hospitals at both the federal and state level, and that governs how the data is communicated between the systems. The proposed system must follow all regulations, policies, and governances set forth by the local, state, and federal government and internal policies and procedures of the healthcare system.

3.png

Figure 3. The communications flow chart for the proposed HIMS solution.

Regulations, policies, and governance for the Healthcare industry

With any data solution that involves patient data, the data itself must be recognized as about a person’s identity, and how that person’s identity flows from one IT platform to another is where the concept of privacy and the protection of a patient’s identity becomes important (Richards & Kings, 2014).  It is due to the protection of information as it flows from the patient to the healthcare provider, from the healthcare provider to the IT solution, and from the IT solution to the healthcare provider, is where legal regulations, policies, and governance comes into play in order to protect the patient (O’Driscoll, Daugelaite, & Sleator, 2013). The goal of using the proposed HIMS for these four states is to allow for administrative simplification, lowering costs, improved data security, and lowering error rates, therefore providing better care to the patients (HIPAA, n.d.).

This proposed solution must follow the Health Information for Economic and Clinical Health (HITECH) Act, which promotes “meaningful use” reporting standards, the Health Insurance Portability and Accountability Act (HIPAA), which promotes data privacy of the patients, and the International Standards Organization (ISO) 9001, which focuses on quality management standards (HHS, n.d.; HIPAA, n.d.; McEwen, Boyer, & Sun, 2016; Microsoft, 2016; Nolan, 2015).  However, Richards and Kings (2014), argued that the Act of patients disclosing information to their healthcare provider or the collection of patient data from IoT solutions, there is a loss of control of their personal data, but there is an expectation that the data will remain confidential and not shared with others.  HIPAA (n.d.) allows for disclosure of patient data without their authorization if the discolored deals with treatment, payment, operations, or a subpoena.  However, for the patient data to be entered into a centralized HIMS, patients must authorize it.  McEwen et al. (2016), suggested that data disclosure options to be provided to all patients to provide the best protection and care for the patients: open-consent (data can be used in the future for a specific purpose or research project), broad-consent (data can be used in all cases), or an opt-out consent (broad application of the data, but patients can say no to certain cases).

HIPPA describes how healthcare providers, healthcare clearinghouses, and health plans must de-identify 18 key data points to protect the patient: Names, Geographic data, Dates, Telephone Numbers, VIN, Fax, Device ID and serial numbers, emails addresses, URLs, SSN, IP address, Medical Record Numbers, Biometric ID (fingerprints, iris scans, voice prints, etc), full face photos, health plan beneficiary numbers, account numbers, any other unique ID number (characteristic, codes, etc), and certifications/license numbers (HHS, n.d.; HIPAA, n.d.). If this data is not de-identified properly following the procedures outlined in HIPAA, cyber-criminals, can hack into the centralized HIMS for these four corner states and de-identify the data and leak the information out to the world, causing defamation, stolen identity, etc. (HIPAA, n.d.).  This could be mitigated if ISO 9001 was implemented because of internal audits would be a standard practice and are conducted to ensure quality management of the data and IT system, constant risk assessments to reduce cost, and ensure continual service improvement to drive improvements to the system to be proactive rather than reactive to cyber threats (Nolan, 2015).  Given the information above, HIMS would best be suited as a private cloud solution, where only the data within it can be seen or used by all four states (Lau, 2011).

Assumptions and limitations

There is an assumption that many healthcare organizations and bigger hospitals will have their IT departments, which implement IT solutions that meet Regulations, Policies, and Governance for the healthcare industry (Microsoft, 2016).  Therefore, internal performance and quality management can vary drastically between hospitals and across state boundaries (Nolan, 2015). Also, smaller hospitals and medical facilities may not have the resources to have their IT department. Thus a solution must be devised that is simple, secure, and feasible enough to implement to help bring them onboard with confidence that the proposed solution will fit their needs and provide substantial benefits (Microsoft, 2016).  Nolan (2015), proposed that following ISO 9001 standards would allow for uniformity of objects and methodologies across all hospitals in the four-state region, reduction of the cost needed for different training solutions for different hospitals, and allow for greater efficiency in secure legal data sharing.

Other assumptions that could exist with a centralized HIMS is that with enough data gathered from multiple hospitals, health status changes can become predictable, preventable, or managed and did so would be easier, cheaper and humane (Flower, n.d.).  If patients give broad consent or even opt-out consent to their data, given then, healthcare providers could monitor a patient’s health and be in low-intensity high volume lifelong contact with the patient (Flower, n.d.; McEwen et al., 2016). This will allow patients and healthcare providers to be partners in managing the patient’s personal and family health.  Finally, a system like HIMS could improve the overall health of the population through prediction, prevention, and management of the population’s health (Flower, n.d.).

A limitation to this proposal is the assumption that data is being cleaned, such that it is reliable and credible.  That is because poor data quality, when used in data mining, machine learning, and data analytics, will impact the results and therefore impact any data-driven decisions (Corrales, Ledezma, & Corrales, 2015).  Data cleaning and preprocessing must be done before modeling, and it requires that the IT professionals know and understand the data that is being collected and integrated from heterogeneous datasets (Corrales et al., 2015, Hendler, 2016).

Justification for overall design

This design proposal is for the major corporation who owns a string of state-of-the-art hospitals in the four corners area of the United States (Arizona, Colorado, New Mexico, and Utah). The solution proposed in this proposal for a centralized Healthcare Information Management System (HIMS) to give key stakeholders access to information derived from data that may be hidden in silos and to bring forth vital information needed to provide data-driven decisions.  In Summary, the justification for the HIMS was designed:

  • To allow for these hospitals to be more agile, responsive, and competitive in the healthcare industry, by centralizing and standardizing the datasets across all four states.
  • For the collection, processing, and analyzing data to deliver key insights quickly and accurately to provide better service to the patients.
  • To allow for identification of redundant and duplicitous data that can exist if a patient data is found in more than one hospital, and allowing for that data to be merged into one record.
  • To allow for analysis and visualization of new, functional and experimental data, in the context of old existing data, information, and knowledge so that everyone can view the data that they need at the time that they need it at (Higdon et al., 2016).
  • To allow integration of big internal and external datasets and structured and unscripted data from different sources, through a distributed database system provided by Hadoop’s HDBS (Gary et al., 2005; Hendler, 2016; Hortonworks, 2013, IBM, n.d.).
  • For the use of Hadoop, which is a Java-based system that is a Platform as a Service (PaaS cloud solution), which allows for a pay-as-you-go business model, where the healthcare industry in these four states to pay only for what they need at that time, (Dikaiakos et al., 2009; Lau, 2011).
  • To allow for a reduction in IT Infrastructure costs and there will be no need to absorb the cost of upgrading the IT Infrastructure every 2 to 3 years (Lau, 2011).
  • To allow for the utilization of MapReduce which map the data and reduce the data using parallel processing to discover hidden insights in the healthcare data (Gary et al., 2005, Hortonworks, 2013, IBM, n.d.).
  • To use GUI systems and forms, data that is written/edited can easily be written into an XML format, which should be expandable, consistent across all four states, and meets standards for all four states and federal government (Font, 2010).
  • To allow for the analysis of data derived from the IoT, which is a set of various large and streaming data sets from multiple devices/sensors dealing with sensitive patient data.
  • To allow for administrative simplification, lowering costs, improved data security, and lowering error rates, therefore providing better care to the patients (HIPAA, n.d.).

Thus, this design proposal recommends the use of a centralized distributed database system and the use of Hadoop, such that insights can be garnered and visualized to derive data-driven healthcare decision and provide improved care for the patients.

References

  • Lublinsky, B., Smith, K., Yakubovich, A. (2013). Professional Hadoop Solutions. Wrox, VitalBook file.
  • McEwen, J. E., Boyer, J. T., & Sun, K. Y. (2013). Evolving approaches to the ethical management of genomic data. Trends in Genetics, 29(6), 375-382.

Big Data Analytics: POTUS Report

This has become a data-centric society, relying on real-time data and technology (i.e., cell phone, shopping online, social networking) more than ever. Although there are many advantages associated with the use of this data, there are concerns that the collection of massive amounts of data can lead to an invasion of privacy. In January, 2014, President Obama asked his staff to take the next 90 days to prepare a report for him on how big data is affecting people’s privacy. This post revolves around this report.

The aims of big data analytics are for data scientist to fuse data from various data sources, various data types, and in huge amounts so that the data scientist could find relationships, identify patterns, and find anomalies.  Big data analytics can help provide either a descriptive, prescriptive, or predictive result to a specific research question.  Big data analytics isn’t perfect, and sometimes the results are not significant, and we must realize that correlation is not causation.  Regardless, there are a ton of benefits from big data analytics, and this is a field where policy has yet to catch up to the field to protect the nation from potential downsides while still promoting and maximizing benefits.

Policies for maximizing benefits while minimizing risk in public and private sector

In the private sector, companies can create detailed personal profiles will enable personalized services from a company to a consumer.  Interpreting personal profile data would allow a company to retain and command more of the market share, but it can also leave room for discrimination in pricing, services quality/type, and opportunities through “filter bubbles” (Podesta, Pritzker, Moniz, Holdren, & Zients, 2014).  Policy recommendation should help to encourage de-identifying personally identifiable information to a point that it would not lead to re-identification of the data. Current policies for the private sector for promoting privacy are (Podesta, et al., 2014):

  • Fair Credit Reporting Act, helps to promote fairness and privacy of credit and insurance information
  • Health insurance Portability and Accountably Act enables people to understand and control how personal health data is used
  • Gramm-Leach-Bliley Act, helps consumers of financial services have privacy
  • Children’s Online Privacy Protection Act minimizes the collection/use of children data under the age of 13
  • Consumer Privacy bill of rights is a privacy blueprint that aids in allowing people to understand what their personal data is being collected and used for that are consistent with their expectation.

In the public sector, we run into issues, when the government has collected information about their citizens for one purpose, to eventually, use that same citizen data for a different purpose (Podesta, et al., 2014).  This has the potential of the government to exert power eventually over certain types of citizens and tamper civil rights progress in the future.  Current policies in the public sector are (Podesta, et al., 2014):

  • The Affordable Care Act allows for building a better health care system from a “fee-for-service” program to a “fee-for-better-outcomes.” This has allowed for the use of big data analytics to promote preventative care rather than emergency care while reducing the use of that data to eliminate health care coverage for “pre-existing health conditions.”
  • The Family Education Rights and Privacy Act, the Protection of Pupil Rights Amendment and the Children’s Online Privacy Act help seal children educational records to prevent misuse of that data.

Identifying opportunities for big data in the economy, health, education, safety, energy-efficiency

In the economy, the use of the internet of things to equip parts of product with sensors to help monitor and transmit live, thousands of data points for sending alerts.  These alerts can tell us when maintenance is needed, for which part and where it is located, making the entire process save time and improving overall safety(Podesta, et al., 2014).

In medicine, the use of predictive analytics could be used to identify instances of insurance fraud, waste, and abuse, in real time saving more than $115M per year (Podesta, et al., 2014).  Another instance of using big data is for studying neonatal intensive care, to help use current data to create prescriptive results to determine which newborns are likely to come into contact with which infection and what would that outcome be (Podesta, et al., 2014).  Monitoring newborn’s heart rate and temperature along with other health indicators can alert doctors of an onset of an infection, to prevent it from getting out of hand. Huge amounts of genetic data sets are helping locate genetic variant to certain types of genetic diseases that were once hidden in our genetic code (Podesta, et al., 2014).

With regards to national safety and foreign interests, data scientist and data visualizers have been using data gathered by the military, to help commanders solve real operational challenges in the battlefield (Podesta, et al., 2014).  Using big data analytics on satellite data, surveillance data, and traffic flow data through roads, are making it easier to detect, obtain, and properly dispose of improvised explosive devices (IEDs).  The Department of Homeland Security is aiming to use big data analytics to identify threats as they enter the country and people of higher than the normal probability to conduct acts of violence within the country (Podesta, et al., 2014). Another safety-related used of big data analytics is the identification of human trafficking networks through analyzing the “deep web” (Podesta, et al., 2014).

Finally for energy-efficiency, understanding weather patterns and climate change, can help us understand our contribution to climate change based on our use of energy and natural resources. Analyzing traffic data, we can help improve energy efficiency and public safety in our current lighting infrastructure by dimming lights at appropriate times (Podesta, et al., 2014).  Energy efficiencies can be maximized within companies using big data analytics to control their direct, and indirect energy uses (through maximizing supply chains and monitoring equipment).  Another way we are moving to a more energy efficient future is when the government is partnering with the electric utility companies to provide businesses and families access to their personal energy usage in an easy to digest manner to allow people and companies make changes in their current consumption levels (Podesta, et al., 2014).

Protecting your own privacy outside of policy recommendation

In this report it is suggested that we can control our own privacy through using the browse in private function in most current internet browsers, this would help prevent the collection of personal data (Podesta, et al., 2014). But, this private browsing varies from internet browser to internet browser.  For important information like being denied employment, credit or insurance, consumers should be empowered to know why they were denied and should ask for that information (Podesta, et al., 2014).  Find out the reason why can allow people to address those issues in order to persevere in the future.  We can encrypt our communications as well, in order to protect our privacy, with the highest bit protection available.  We need to educate ourselves on how we should protect our personal data, digital literacy, and know how big data could be used and abused (Podesta, et al., 2014).  While we wait for currently policies to catch up with the time, we actually have more power on our own data and privacy than we know.

 

Reference:

Podesta, J., Pritzker, P., Moniz, E. J., Holdren, J. & Zients,  J. (2014). Big Data: Seizing Opportunities, Preserving Values.  Executive Office of the President. Retrieved from https://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1_2014.pdf