Design Proposal for Healthcare based on the Internet of Things

As a big data analyst, this post would be considering that a major corporation who owns a string of state-of-the-art hospitals in the four corners area of the United States (Arizona, Colorado, New Mexico, and Utah) would like to incorporate data analytics in its operations. The first official task is to provide a Hadoop solution to their business problem of analyzing various data sets. Some structured and some unstructured data have been saved. Thus, this is a design proposal using a Hadoop environment with examples for a design flow chart through XML.

Advertisements

Introduction

This design proposal is for the major corporation who owns a string of state-of-the-art hospitals in the four corners area of the United States (Arizona, Colorado, New Mexico, and Utah). The solution proposed in this proposal for a centralized Healthcare Information Management System (HIMS) to give key stakeholders access to information derived from data that may be hidden in silos and to bring forth vital information needed to provide data-driven decisions.  The goal of this proposal is to allow for collecting, processing, and analyzing data to deliver key insights quickly and accurately to provide better service to the patients.  This proposal should allow for these hospitals to be more agile, responsive, and competitive in the healthcare industry. Thus, this proposal will call for the use Hadoop to analyze data derived from the Internet of Things (IoT) and social media for these hospitals, which is a set of various large and streaming data sets from multiple devices/sensors dealing with sensitive patient data.  This design proposal will consist of a design flow chart that will use XML, include the use of Hadoop and recommend a suite of data visualization tools currently used in the industry to visualize HIMS data.

Requirements

The HIMS must allow for analysis and visualization of new, functional and experimental data, in the context of old existing data, information, and knowledge (Higdon et al., 2016). Dealing with both structured and unstructured data can present a real-world challenge for most hospitals. Structured data exists from the devices and sensors used to monitor patients, while unstructured data exists from clinical, nursing, and doctors notes and diagnosis, which is being saved in a centralized data warehouse. Other data sources exist in helping maintain finances, HR, facility management, etc. are out of scope in this proposal. Another source of unstructured data is on social media from those related to the patients and those sent out of from the hospital; both are also out of scope of this design

This proposed system should be able to integrate internal and external datasets and structured and unscripted data from different sources, such that a traditional relationship database is not adequate to handle the amount of data and different data types (Hendler, 2016). Traditional databases rely on tables and arrays, but the data that come from the healthcare industry comes in N-dimensional arrays as well as from multiple difference sources which can vary across time and contain text data (Gary et al., 2005). Hence, this means that traditional databases are not the best solution for the HIMS.

Since Hadoop is a Platform as a Service (PaaS cloud solution), it can manage administers the runtime, middleware, O/S, virtualization, servers, storage, and networking, while the IT department managing the HIMS deals with the applications, data, and visualization of the results (Lau, 2001).  This provides another advantage over the use of traditional relational database systems for HIMS, which is dependent on the hardware infrastructure (Minelli, Chambers, & Dhiraj, 2013).  The benefits of using a cloud-based solution is a pay-as-you-go business model, which allows for the healthcare industry in these four states to pay only for what they need at that time, allowing for scaling up and down of computational and financial resources (Dikaiakos, Katsaros, Mehra, Pallis, & Vakali, 2009; Lau, 2011). Finally, another benefit of using a cloud-based solution like Hadoop is the reduction in IT Infrastructure costs, and there will be no need to absorb the cost of upgrading the IT Infrastructure every 2 to 3 years (Lau, 2011).

Therefore, to deal with the amount and complexity of the types of data a PaaS solution using Hadoop is recommended over relational databases.  PaaS tools like Hadoop uses distributed databases, which allows for parallel processing of huge amounts of data, for parallel searching, metadata management, analysis, use of tools like MapReduce, and workflow system analysis (Gary et al., 2005, Hortonworks, 2013, IBM, n.d.; Minelli et al., 2013).  MapReduce uses Hadoop’s Distributed File System (HDFS) to map the data and reduce the data using parallel processing to discover hidden insights in the healthcare data (Gary et al., 2005, Hortonworks, 2013, IBM, n.d.).  The HDFS stores the data into smaller blocks of data, which can be recombined when needed into one system, like Lego Blocks, and provides a means for throughput to the data (Hortonworks, 2013; IBM, n.d.).

Finally, for data visualization the California HealthCare Foundation ([CHCF], 2014) recommended that data visualization tools that everyone could use would be: Google Charts & Maps, Tableau Public, Mapbox, Infogram, Many Eyes, iCharts, and Datawrapper. CHCF (2014) also recommended some data visualization tools for developers such as High Charts, TileMill, D3.js, FLOT, Fusion Charts, OpenLayers, and JSMap.  Either of these solutions is fine, and choosing between them should be left to the IT professionals and key stakeholders.  All of this technological backbone to HIMS could become complicated, and for patients, healthcare providers, healthcare clearinghouses, and health plans staff this knowledge is not needed to get their job done and meet their data needs.  It is the job of the IT department to make this happen and to educate people about the use of HIMS to meet their needs.

Education amongst the healthcare providers, healthcare clearinghouses, and health plans on the role of a centralized HIMS for conducting data analysis to improve patient care is necessary (Higdon et al., 2016). These groups of people will be interfacing with the system, Hadoop environment, data, etc.  Therefore, outreach and education program must be implemented to ensure buy-in by all the key stakeholders, as well as training on how to use the system to meet their current needs.  When interacting with the HIMS a graphical user interface (GUI) should be used for patients, healthcare providers, healthcare clearinghouses and health plans, to ensure ease of use:

  • patients should only have read access to their data only
  • healthcare providers should have read/write/edit access to the data of the patient they are caring for
  • healthcare clearinghouses and health plans should only have read access to patient medication and services performed, but no access to anything else and have read/write/edit access to data that fits their scope of work
  • IT department should have full privileges to read/write/edit data that is deidentified patient data to ensure data quality, data input/output, and for data mining

GUIs are software that is built on top of the Hadoop environment and on the computational front-end hardware used to access this data.  The GUI could have forms for providers to Extract, Load, and Transform data easily independent of the operating system (Linux, Mac OS, Windows, Solaris, etc.) and device (Laptops, Tablets, Smartphones, etc.).

Data flow diagrams

Using GUI systems and forms, data that is written/edited can easily be written into an XML format, which should be expandable, consistent across all four states, and meets standards for all four states and federal government (Font, 2010):

<?XML version = “1.0”?>

<!—File name: HIMSmanualDataEntry.xml-->

<State>

            <Statename> Arizona </Statename>

            <HospitalName>…</ HospitalName>

            <DepartmentName>…</DepartmentName>

            …

</State>

<Patient>

            <PatientFirstName>…</PatientFirstName>

            <PatientMiddleName>…</PatientMiddleName>

            <PatientLastName>…</PatientLastName>

            <PatientID>…</PatientID>

            <PatientDOB>…</PatientDOB>

            <PatientStreetAddress>…</PatientStreetAddress>

            <PatientCity>…</PatientCity>

            <PatientState>…</PatientState>

            <PatientZipCode>…</PatientZipCode>

            <PatientHeight>…</PatientHeight>

            <PatientWeight>…</PatientWeight>

            <PatientOnMedications>…</PatientOnMedications>

            <PatientPrimaryCarePhysician>…</PatientPrimaryCarePhysician>

            …

            <PatientSatisfactionSurveyResultsQ1>…</PatientSatisfactionSurveyResultsQ1>

…

<PatientResponseOnSocialMediaPage>…</PatientResponseOnSocialMediaPage>

…

</Patient>

Data from sensors could also be entered into an XML format:

<?xml version = “1.0”?>

<!—File name: HIMSmanualDataEntry.xml–>

<Sensor>

<SensorName>…</SensorName>

<SensorType>…</SensorType>

<SensorManufaturer>…</SensorManufaturer>

<SensorMarginOfError>…</SensorMarginOfError>

<SensorMinValue>…</SensorMinValue>

<SensorMaxValue>…</SensorMaxValue>

</Sensor>

<State>

<Statename> Arizona </Statename>

<HospitalName>…</ HospitalName>

<DepartmentName>…</DepartmentName>

</State>

<Patient>

<PatientID>…</PatientID>

</Patient>

<SensorReadings>

<TimeStamp>…</TimeStamp>

<HeathIndicator1Value>…</ HeathIndicator1Value >

<HeathIndicator2Value >…</ HeathIndicator1Value >

</ SensorReadings >

 

All of this patient data comes in through these different data sources via a GUI or directly in XML format and entered into each state’s respective data centers, where each state inputs that data into HIMS data center, which gets processed and the results are displayed in the GUI to the end-user (Figure 1). Parallel processing occurs when the data is split, mapped and reduced through N nodes, which allows returning data-driven results at a much faster pace than having one node reduce the data in the HIMS datacenter.

1.png

Figure 1: Data flow diagram from the data source to data processing to results.

Overall system diagram

The entire HIMS is built on HDFS and Hadoop to leverage parallel processing and data analytics programs like MapReduce (Figure 2). Hadoop’s Mahout library would allow for Hadoop to use classification, filtering, k-means, Dirichlet, parallel pattern, and Bayesian classification similar to Hadoop’s MapReduce (Wayner, 2013; Lublinksy, Smith, & Yakubovich, 2013). Essentially a data analytics library. Hadoop’s Avro Library should help process the XML data in HIMS (Wayner, 2013; Lublinksy et al., 2013).   While Hadoop’s YARN can set up job scheduling and cluster management (Lublisky et al., 2013).  Finally, Hadoop’s Oozie library manages the workflow of a job by allowing the user to break the job into simple steps in a flowchart fashion. Essentially a workflow manager (Wayner, 2013; Lublinksy et al., 2013).  Though the use of Hadoop’s MapReduce function, petabytes or exabytes of data will be split into megabyte or gigabyte size files (depending on how many nodes and the input data) to be processed and analyzed. The analyzed results will provide useful data or insights hidden in the large data to the end user on their hardware devices via a GUI frontend.

2

Figure 2: Overall systems diagram for HIMS, which is based on Hadoop solution.

 Communication flow chart

Each of the hospitals will have patients, healthcare providers, healthcare clearinghouses and health plans professionals and each of them talk to one another.  Data derived from these conversations get placed inside each hospital’s data center, which gets translated to each states’ datacenter and finally placed into the HIMS data center (Figure 3).  Also, in figure three shows the hidden communications of Regulations, Policies, and Governance set forth between the hospitals at both the federal and state level, and that governs how the data is communicated between the systems. The proposed system must follow all regulations, policies, and governances set forth by the local, state, and federal government and internal policies and procedures of the healthcare system.

3.png

Figure 3. The communications flow chart for the proposed HIMS solution.

Regulations, policies, and governance for the Healthcare industry

With any data solution that involves patient data, the data itself must be recognized as about a person’s identity, and how that person’s identity flows from one IT platform to another is where the concept of privacy and the protection of a patient’s identity becomes important (Richards & Kings, 2014).  It is due to the protection of information as it flows from the patient to the healthcare provider, from the healthcare provider to the IT solution, and from the IT solution to the healthcare provider, is where legal regulations, policies, and governance comes into play in order to protect the patient (O’Driscoll, Daugelaite, & Sleator, 2013). The goal of using the proposed HIMS for these four states is to allow for administrative simplification, lowering costs, improved data security, and lowering error rates, therefore providing better care to the patients (HIPAA, n.d.).

This proposed solution must follow the Health Information for Economic and Clinical Health (HITECH) Act, which promotes “meaningful use” reporting standards, the Health Insurance Portability and Accountability Act (HIPAA), which promotes data privacy of the patients, and the International Standards Organization (ISO) 9001, which focuses on quality management standards (HHS, n.d.; HIPAA, n.d.; McEwen, Boyer, & Sun, 2016; Microsoft, 2016; Nolan, 2015).  However, Richards and Kings (2014), argued that the Act of patients disclosing information to their healthcare provider or the collection of patient data from IoT solutions, there is a loss of control of their personal data, but there is an expectation that the data will remain confidential and not shared with others.  HIPAA (n.d.) allows for disclosure of patient data without their authorization if the discolored deals with treatment, payment, operations, or a subpoena.  However, for the patient data to be entered into a centralized HIMS, patients must authorize it.  McEwen et al. (2016), suggested that data disclosure options to be provided to all patients to provide the best protection and care for the patients: open-consent (data can be used in the future for a specific purpose or research project), broad-consent (data can be used in all cases), or an opt-out consent (broad application of the data, but patients can say no to certain cases).

HIPPA describes how healthcare providers, healthcare clearinghouses, and health plans must de-identify 18 key data points to protect the patient: Names, Geographic data, Dates, Telephone Numbers, VIN, Fax, Device ID and serial numbers, emails addresses, URLs, SSN, IP address, Medical Record Numbers, Biometric ID (fingerprints, iris scans, voice prints, etc), full face photos, health plan beneficiary numbers, account numbers, any other unique ID number (characteristic, codes, etc), and certifications/license numbers (HHS, n.d.; HIPAA, n.d.). If this data is not de-identified properly following the procedures outlined in HIPAA, cyber-criminals, can hack into the centralized HIMS for these four corner states and de-identify the data and leak the information out to the world, causing defamation, stolen identity, etc. (HIPAA, n.d.).  This could be mitigated if ISO 9001 was implemented because of internal audits would be a standard practice and are conducted to ensure quality management of the data and IT system, constant risk assessments to reduce cost, and ensure continual service improvement to drive improvements to the system to be proactive rather than reactive to cyber threats (Nolan, 2015).  Given the information above, HIMS would best be suited as a private cloud solution, where only the data within it can be seen or used by all four states (Lau, 2011).

Assumptions and limitations

There is an assumption that many healthcare organizations and bigger hospitals will have their IT departments, which implement IT solutions that meet Regulations, Policies, and Governance for the healthcare industry (Microsoft, 2016).  Therefore, internal performance and quality management can vary drastically between hospitals and across state boundaries (Nolan, 2015). Also, smaller hospitals and medical facilities may not have the resources to have their IT department. Thus a solution must be devised that is simple, secure, and feasible enough to implement to help bring them onboard with confidence that the proposed solution will fit their needs and provide substantial benefits (Microsoft, 2016).  Nolan (2015), proposed that following ISO 9001 standards would allow for uniformity of objects and methodologies across all hospitals in the four-state region, reduction of the cost needed for different training solutions for different hospitals, and allow for greater efficiency in secure legal data sharing.

Other assumptions that could exist with a centralized HIMS is that with enough data gathered from multiple hospitals, health status changes can become predictable, preventable, or managed and did so would be easier, cheaper and humane (Flower, n.d.).  If patients give broad consent or even opt-out consent to their data, given then, healthcare providers could monitor a patient’s health and be in low-intensity high volume lifelong contact with the patient (Flower, n.d.; McEwen et al., 2016). This will allow patients and healthcare providers to be partners in managing the patient’s personal and family health.  Finally, a system like HIMS could improve the overall health of the population through prediction, prevention, and management of the population’s health (Flower, n.d.).

A limitation to this proposal is the assumption that data is being cleaned, such that it is reliable and credible.  That is because poor data quality, when used in data mining, machine learning, and data analytics, will impact the results and therefore impact any data-driven decisions (Corrales, Ledezma, & Corrales, 2015).  Data cleaning and preprocessing must be done before modeling, and it requires that the IT professionals know and understand the data that is being collected and integrated from heterogeneous datasets (Corrales et al., 2015, Hendler, 2016).

Justification for overall design

This design proposal is for the major corporation who owns a string of state-of-the-art hospitals in the four corners area of the United States (Arizona, Colorado, New Mexico, and Utah). The solution proposed in this proposal for a centralized Healthcare Information Management System (HIMS) to give key stakeholders access to information derived from data that may be hidden in silos and to bring forth vital information needed to provide data-driven decisions.  In Summary, the justification for the HIMS was designed:

  • To allow for these hospitals to be more agile, responsive, and competitive in the healthcare industry, by centralizing and standardizing the datasets across all four states.
  • For the collection, processing, and analyzing data to deliver key insights quickly and accurately to provide better service to the patients.
  • To allow for identification of redundant and duplicitous data that can exist if a patient data is found in more than one hospital, and allowing for that data to be merged into one record.
  • To allow for analysis and visualization of new, functional and experimental data, in the context of old existing data, information, and knowledge so that everyone can view the data that they need at the time that they need it at (Higdon et al., 2016).
  • To allow integration of big internal and external datasets and structured and unscripted data from different sources, through a distributed database system provided by Hadoop’s HDBS (Gary et al., 2005; Hendler, 2016; Hortonworks, 2013, IBM, n.d.).
  • For the use of Hadoop, which is a Java-based system that is a Platform as a Service (PaaS cloud solution), which allows for a pay-as-you-go business model, where the healthcare industry in these four states to pay only for what they need at that time, (Dikaiakos et al., 2009; Lau, 2011).
  • To allow for a reduction in IT Infrastructure costs and there will be no need to absorb the cost of upgrading the IT Infrastructure every 2 to 3 years (Lau, 2011).
  • To allow for the utilization of MapReduce which map the data and reduce the data using parallel processing to discover hidden insights in the healthcare data (Gary et al., 2005, Hortonworks, 2013, IBM, n.d.).
  • To use GUI systems and forms, data that is written/edited can easily be written into an XML format, which should be expandable, consistent across all four states, and meets standards for all four states and federal government (Font, 2010).
  • To allow for the analysis of data derived from the IoT, which is a set of various large and streaming data sets from multiple devices/sensors dealing with sensitive patient data.
  • To allow for administrative simplification, lowering costs, improved data security, and lowering error rates, therefore providing better care to the patients (HIPAA, n.d.).

Thus, this design proposal recommends the use of a centralized distributed database system and the use of Hadoop, such that insights can be garnered and visualized to derive data-driven healthcare decision and provide improved care for the patients.

References

  • Lublinsky, B., Smith, K., Yakubovich, A. (2013). Professional Hadoop Solutions. Wrox, VitalBook file.
  • McEwen, J. E., Boyer, J. T., & Sun, K. Y. (2013). Evolving approaches to the ethical management of genomic data. Trends in Genetics, 29(6), 375-382.

Data Tools: Use of XML

Many industries are using XML. Some see advantages and others see challenges or disadvantages in using XML.

XML advantages

+ Writing your markup language and are not limited to the tags defined by other people (UK Web Design Company, n.d.)

+ Creating your tags at your pace rather than waiting for a standard body to approve of the tag structure (UK Web Design Company, n.d.)

+ Allows for a specific industry or person to design and create their set of tags that meet their unique problem, context, and needs (Brewton, Yuan, & Akowuah, 2012; UK Web Design Company, n.d.)

+ It is both human and machine-readable format (Hiroshi, 2007)

+ Used for data storage and processing both online and offline (Hiroshi, 2007)

+ Platform independent with forward and backward capability (Brewton et al., 2012; Hiroshi, 2007)

XML disadvantages

– Searching for information in the data is tough and time-consuming without a computer processing application (UK Web Design Company, n.d.)

– Data is tied to the logic and language similar to HTML without a readily made browser to simply explore the data and therefore may require HTML or other software to process the data (Brewton et al., 2012; UK Web Design Company, n.d.)

– Syntax and tags are redundant, which can consume huge amounts of bytes, and slow down processing speeds (Hiroshi, 2007)

– Limited to relational models and object-oriented graphs (Hiroshi, 2007)

– Tags are chosen by their creator. Thus there are no standard set of tags that should be used (Brewton et al., 2012)

XML use in Healthcare Industry

Thanks to the American National Standards Institute, the Health Level 7 (HL7) was created with standards for health care XML, which is now in use by 90% of all large hospitals (Brewton et al., 2012; Institute of Medicine, 2004). The Institute of Medicine (2004), stated that health care data could consist of: allergies immunizations, social histories, histories, vital signs, physical examination, physician’s and nurse’s notes, laboratory tests, diagnostic tests, radiology test, diagnoses, medications, procedures, clinical documentations, clinical measure for specific clinical conditions, patient instructions, dispositions, health maintenance schedules, etc.  More complex datasets like images, sounds, and other types of multimedia, are yet to be included (Brewton et al., 2012).  Also, terminologies within the data elements are not systemized nomenclature, and it does not support web-protocols for more advanced communications of health data (Institute of Medicine, 2004). HL7 V3 should resolve a lot of these issues, which should also account for a wide variety of health care scenarios (Brewton et al., 2012).

XML use in Astronomy

The Flexible Image Transport System (FITS), currently used by NASA/Goddard Space Flight Center, holds images, spectra, tables, and sky atlases data, which has been in use for 30 years (NASA, 2016; Pence et al. 2010). The newest version has a definition of time coordinates, support of long string keywords, multiple keywords, checksum keywords, image and table compression standards (NASA, 2016).  There was support for mandatory keywords previously (Pence et al. 2010).  Besides the differences in data entities and therefore tags needed to describe the data between the XML for healthcare and astronomy, the use of XML for a much longer period has allowed for a more robust solution that has evolved with technology.  It is also widely used as it is endorsed by the International Astronomical Union (NASA, 2016; Pence et al., 2010).  Based on the maturity of FITS, due to its creations in the late 1970s, and the fact that it is still in use, heavily endorsed, and is a standard still in use today, the healthcare industry could learn something from this system.  The only problem with FITS is that it removes some of the benefits of XML, which includes flexibility to create your tags due to the heavy standardization and standardization body.

Resources

Business Intelligence: Predictions Followup

The last post discussed the future of data mining. For this post, I will expand my opinion on what business intelligence (BI) is moving toward in the future.

  • Potential Opportunities:

o    Health monitoring.  Currently, smart watches are tracking our heart rate, steps, standing time, climbing stairs, siting time, heart beats, workouts, biking, sleep, etc.  But, what if we had a device that measured daily our chemicals in our blood, that is no longer as painful as pricking your finger if you are diabetic.  This, the technology could not only measure your blood chemical makeup but could send alerts to EMT and doctors if there is a dangerous imbalance of chemicals in your blood (Carter et al., 2014).  This would require a strong BI program across emergency responders, individuals, and doctors.

o    As Moore’s law of computational speed moves forward in time, the more chances are companies able to interpret real-time data and produce lead information which can drive actionable data-driven decisions. Companies can finally get answers to strategic business questions in minutes as well (Carter et al., 2014).

o    Both internal data (corporate data) and external data (competitor analysis, costumer analysis, social media, affinity and sentiment analysis), will be reported to senior leaders and executives who have the authority to make decisions on behalf of the company on a frequent basis.  These issues may show up in a dashboard, with x number of indicators/metrics as successfully implemented in a case study of a hospital (Topaloglou & Barone, 2015).

  • Potential Pitfalls:

o    Tools for threat detection, like those being piloted in New York City, could have an increased level of discrimination (Carter, Farmer, & Siegel, 2014). As big data analytics is being used to do facial recognition of photographs and live video to identify threats, it can lead to more racial profiling if the knowledge fed into the system as a priori has elements of racial profiling.  This could lead to a bias in reporting, track higher levels of a particular demographic, and the fact that past performance doesn’t indicate the future.

o    Data must be validated before it is published onto a data warehouse.  Due to the low data volatility feature of data warehouses, we need to ensure that the data we receive is correct, thus expected value thresholds must be set to capture errors before they are entered.  Wrong data in, means wrong data analysis, and wrong data-drove decisions.  An example of expected value thresholds could be that earth’s temperature cannot exceed 500K at the surface.

o    Amplified customer experience.  As BI incorporates social media to gauge what is going on in the minds of their customer, if something were to go viral that could hurt the company, it can be devastating for the company.  Essentially we are giving the customer an amplified voice.  This can be rumors of software, hardware leaks as what happens for every Apple iPhone generation/release, which can put current proprietary information into the hands of their competitors.  A nasty comment or post that gets out of control on a social media platform, to celebrity boycotts.  Though, the opportunity here lies in receiving key information on how to improve their products, identify leakers of information, and settle nasty rumors, issues, or comments.

  • Potential Threats:

o    Loss of data through hackers, which are aiming to steal someone’s identity.  Firewalls must be tighter than ever, and networks must be more secure than ever as a company goes into a centralized data warehouse.  Data warehouses are vital for BI initiatives, but if HR data is located in the warehouse, (for example to help HR identify likelihood measures of disgruntled employees to aid in their retention efforts) then if a hacker were to get a hold of that data, thousands of people information can be compromised.  This is nothing new, but this is a potential threat that must be mitigated as we proceed into BI systems.  This can not only apply to people data but company proprietary data.

o    Consumer advertisement blitz. If companies use BI to blast their customers with ads in hopes to better market to people and use item affinity analysis, to send coupons and attract more sales and higher revenues.  There is a personal example here for me:  XYZ is a clothing store, when I moved to my first house, the old owner never switched their information in their database.  But, since they were a frequent buyer and those magazines, coupons, flyers, and sales were working on the old owner of the house, they kept getting blasted with marketing ads.  When I moved in, I got a magazine every two days.  It was a waste of paper and made me less likely to shop there.  Eventually, I had enough and called customer service.  They resolved the issue, but it took six weeks after that call, for my address to be removed from their marketing and customer database.  I haven’t shopped there since.

o    Informational overload.  As companies go forward into implementing BI systems, they must meet with the entire multi-level organization to find out their data needs.  Just because we have the data, doesn’t mean we should display it.  The goal is to find the right amount of key success factors, key performance indicators, and metrics, to help out the decision makers at all different levels.  Complicating this part up can compromise the adoption of BI in the organization and will be seen as a waste of money rather than a tool that could help them in today’s competitive market.  This is such a hard line to walk on, but it is one of the biggest threats.  It was realized in the hospital case study (Topaloglou & Barone, 2015) and therefore mitigated for through extensive planning, buy-in, and documentation.

 

Resources:

Business Intelligence: Predictions

According to the Association of Professional Futurists (n.d.), “A professional futurist is a person who studies the future in order to help people understand, anticipate, prepare for and gain advantage from coming changes. It is not the goal of a futurist to predict what will happen in the future. The futurist uses foresight to describe what could happen in the future and, in some cases, what should happen in the future.” In my opinion, I will discuss what the future might hold for Data Mining, Knowledge Management and comprehensive BI program and strategy.

The future of …

  • Data mining:

o    Web structure mining (studying the web structure of web pages) and web usage analysis (studying the usage of web pages) will become more prominent in the future.  Victor and Rex (2016) stated that web mining differs from traditional data mining by scale (web information is much larger in number, making 10M web pages seem like it’s too small), access (web information is mostly public, whereas traditional data could be private), and structure (web pages have unstructured, and semi-structured data, whereas traditional data mining, has some explicit level of structure).  The structure of a website can contain: Page Rank, Page number, Damping factor, Number of pages, out-links, in-links, etc.  Your page is considered an authoritative piece if there are many in-links, or it can be considered a hub if it has many out-links, and this helps define page rank and structure of the website (Victor & Rex, 2016).  But, page rank is too trivial of calculation.  One day we will be able to not only know a page rank of a website, but learn its domain authority, page authority, and domain validity, which will help define how much value a particular site can bring to the person.  If Google were to adopt these measures, we could see

  • Data mining’s link to knowledge management (KM):

o    A move towards the away from KM tools and tool set to seeing knowledge as being embedded into as many processes and people as possible (Ferguson, 2016). KM relies on sharing, and as we move away from tools, processes will be setup to allow this sharing to happen.  Sharing occurs more frequently with an increase in interactive and social environments (Ferguson, 2016).  Thus, internal corporate social media platforms may become the central data warehouse, hosting all kinds of knowledge.  The issue and further research need to go into this, is how do we more people engaged on a new social media platform to eventually enable knowledge sharing. Currently, forums, YouTube, and blogs are inviting, highly inclusive environments that share knowledge, like how to solve a particular issue (evident by YouTube video tutorials).  In my opinion, these social platforms or methods of sharing, show the need for a more social, inclusive, and interactive environment needs to be for knowledge sharing to happen more organically.

o    IBM (2013), shows us a glimpse of how knowledge management from veteran police officers, crime data stored in a crime data warehouse, the power of IBM data mining, can be to identifying criminals.  Mostly criminals commit similar crimes with similar patterns and motives.  The IBM tools augment officer’s knowledge, by narrowing down a list of possible suspects of crime down to about 20 people and ranking them on how likely the suspects committed this new crime.  This has been used in Miami-Dade County, the 7th largest county in the US, and a tool like this will become more widespread with time.

  • Business Intelligence (BI) program and strategy:

o    Potential applications of BI and strategy will go into the health care industry.  Thanks to ObamaCare (not being political here), there will be more data coming in due to an increase in patients having coverage, thus more chances to integrate: hospital data, insurance data, doctor diagnosis, patient care, patient flow, research data, financial data, etc. into a data warehouse to run analytics on the data to create beneficial data-driven decisions (Yeoh, & Popovič, 2016; Topaloglou & Barone, 2015).

o    Potential applications of BI and strategy will affect supply chain management.  The Boeing Dreamliner 787, has outsourced 30% of its parts and components, and that is different to the current Boeing 747 which is only 5% outsourced (Yeoh, & Popovič, 2016).  As more and more companies increase their outsourcing percentages for their product mix, the more crucial is capturing data on fault tolerances on each of those outsourced parts to make sure they are up to regulation standards and provide sufficient reliability, utility, and warranty to the end customer.  This is where tons of money and R&D will be spent on in the next few years.

References

  • Ferguson, J. E. (2016). Inclusive perspectives or in-depth learning? A longitudinal case study of past debates and future directions in knowledge management for development. Journal of Knowledge Management, 20(1).
  • IBM (2013). Miami-Dade Police Department: New patterns offer breakthroughs for cold cases. Smarter Planet Leadership Series.  Retrieved from http://www.ibm.com/smarterplanet/global/files/us__en_us__leadership__miami_dade.pdf
  • Topaloglou, T., & Barone, D. (2015) Lessons from a Hospital Business Intelligence Implementation. Retrieved from http://www.idi.ntnu.no/~krogstie/test/ceur/paper2.pdf
  • Victor, S. P., & Rex, M. M. X. (2016). Analytical Implementation of Web Structure Mining Using Data Analysis in Educational Domain. International Journal of Applied Engineering Research, 11(4), 2552-2556.
  • Yeoh, W., & Popovič, A. (2016). Extending the understanding of critical success factors for implementing business intelligence systems. Journal of the Association for Information Science and Technology, 67(1), 134-147.

Business Intelligence: Effectiveness

This post discusses the structure of an effective business intelligence program. Also, this post explains the key components of the structure. Finally this post explains how knowledge management systems fit into the structure.

Non-profit Hospitals are in a constant state of trying to improve their services and drive down costs. Thus, one of the ways they do this is by turning to Lean Six Sigma techniques and IT to identify opportunities to save money and improve the overall patient experience. Six Sigma relies on data/measurements to determine opportunities for continuous improvements, thus aiding in the hospitals goals, a Business Intelligence (BI) program was developed (Topaloglou & Barone, 2015).

Key Components of the structure

For an effective BI program the responsible people/stakeholders (Actors) are identified, so we define who is responsible for setting the business strategies (Goals).  The strategy must be supported by the right business processes (Objects), and the right people must be assigned as accountable for that process.  Each of these processes has to be measured (Indicators) to inform the right people/stakeholders on how the business strategy is doing.  All of this is a document in a key document (called AGIO), which is essentially a data definition dictionary that happens to be a common core solution (Topaloglou & Barone, 2015).  This means that there is one set of variables names and definitions.

Implementation of the above structure has to take into account the multi-level business and their needs.  Once the implementation is completed and buy off from all other stakeholders has occurred, that is when the business can experience its benefits.  Benefits are: end users can make strategic data based decisions and act on them, a shift in attitudes towards the use and usefulness of information, perception of data scientist from developers to problem solvers, data is an immediate action, continuous improvement is a byproduct of the BI system, real-time views with data details drill down features enabling more data-driven decisions and actions, the development of meaningful dashboards that support business queries, etc. (Topaloglou & Barone, 2015).

Knowledge management systems fit into the structure

“Healthcare delivery is a distributed process,” where patients can receive care from family doctors, clinicians, ER staff,  specialists, acute care, etc. (Topaloglou & Barone, 2015).  Each of these people involved in healthcare delivery have vital knowledge about the patient that needs to be captured and transferred correctly; thus hospital reports help capture that knowledge.  Knowledge also lies with how the patient flows in and out of sections in the hospital, and executives need to see metrics on how all of these systems work together.  Generating a knowledge management distributed database system (KMDBS), aids in tying all this data together from all these different sources to provide the best care for patients, identify areas for continual improvements, and provides this in a neat little portal (and dashboards) for ease of use and ease of knowledge extraction (Topaloglou & Barone, 2015).  The goal is to unify all the knowledge from multiple sources into one system, coming up with a common core set of definitions, variables, and metrics.  The common core set of definitions, variables, and metrics are done so that everyone can understand the data in the KMDBS, and look up information if there are any questions.  The development team took this into account and after meeting with different business levels, the solution that was developed in-house provided all staff a system which used their collective knowledge to draw out key metrics that would aid them in data-driven decisions for continuous improvement on the services they provide to their patients.

1 example

Topaloglou & Barone, (2015) present the following example below:

  • Actor: Emergency Department Manger
  • Goal: Reduce the percentage of patients leaving without being seen
  • Indicator: Percentage of patients left without being seen
  • Object: Physician initial assessment process

 

Resources