Fraud detection in the health care industry using analytics

Fraud is deception, fraud detection is really needed, because as fraud detection algorithms are improving, the rate of fraud is increasing (Minelli, Chambers, &, Dhiraj, 2013). Hadoop and the HFlame distribution have to be used to help identify fraudulent data in other companies like banking in near-real-time (Lublinsky, Smith, & Yakubovich, 2013).

Data mining has allowed for fraud detection via multi-attribute monitoring, where it tries to find hidden anomalies by identifying hidden patterns through the use of class description and class discrimination (Brookshear & Brylow, 2014; Minellli et al., 2013). Class Descriptions identify patterns that define a group of data, and class discrimination identifies patterns that divide groups of data (Brookshear & Brylow, 2014). As data flows in, data is monitored through validity check and detection rules and gives them a score, such that if the validity and detection score surpasses a threshold, that data point is flagged as potentially suspicious (Minelli et al., 2013).

This is a form of outlier data mining analysis, where data that doesn’t fit any of the above groups of data that has been described and discriminated can be used to identify fraudulent data (Brookshear & Brylow, 2014; Connolly & Begg, 2014). Minelli et al. (2013), stated that using historical data to build up the validity check and detection rules with real-time data can help identify outliers in near-real time. However, what about predicting fraud?  In the future, companies will be using Hadoop’s machine learning capability paired with its fraud detection algorithms to provided predictive modeling of fraud events (Lublinsky, Smith, & Yakubovich, 2013).

A process mining framework for the detection of healthcare fraud and abuse case study (Yang & Hwang, 2006)

Fraud exists in processing health insurance claims because there are more opportunities to commit fraud because there are more channels of communication: service providers, insurance agencies, and patients. Any one of these three people can commit fraud, and the highest chance of fraud occurs where service providers can do unnecessary procedures putting patients at risk. Thus this case study provided the framework on how to conduct automated fraud detection. The study collected data from 2543 gynecology patients from 2001-2002 from a hospital, where they filtered out noisy data, identified activities based on medical expertise, identified fraud in about 906.

Before data mining and machine learning, the process was heavily reliant on medical professional with subject matter expertise to detect fraud, which was costly for multiple resources.  Also, machine learning is not subject to human and manual error that is common with humans.  Machine learning algorithms for fraud detection relies on clinical pathways, which are defined as the right people giving the right care services in the right order, with the aim at the reduction of waste and implementing best practices.  Any deviation from this that is abnormal can be flagged by the machine learning algorithm.

References

  • Brookshear, G., & Brylow, D. (2014). Computer Science: An Overview, (12th). Pearson Learning Solutions. VitalBook file.
  • Connolly, T., Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, (6th). Pearson Learning Solutions. VitalBook file.
  • Lublinsky, B., Smith, K., & Yakubovich, A. (2013). Professional Hadoop Solutions. Wrox. VitalBook file.
  • Minelli, M., Chambers, M., &, Dhiraj, A. (2013). Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. John Wiley & Sons P&T. VitalBook file.
  • Yang, W. S., & Hwang, S. Y. (2006). A process-mining framework for the detection of healthcare fraud and abuse.Expert Systems with Applications31(1), 56-68.
Advertisements

Data Tools: Data-In-Motion

How is data-in-motion performed and why is it important to apply data analytics to it?

Definition of terms

Data in-motion: a part of data velocity, which deals with the speed of data coming in from multiple sources as well as the speed of data traveling between systems (Katal, Wazid, & Goudar, 2013). Essentially data-in-motion can encompass data streaming, data transfer, or real-time data. However, there are challenges and issues that have to be addressed to conducting real-time analysis on data streams (Katal et al., 2013; Tsinoremas et al., n.d.).

Data complexity: consists of the joining, cleaning, and transformation of data from multiple systems to find relationships that are highly correlated (Katal et al., 2013).  Complexity increases as the velocity of data coming in or transferred increases (Katal et al., 2013; Tsinoremas et al., n.d.).

Data-in-motion analytics performed in case study (Blount et al., 2010)

Artemis was designed, built and deployed in 2009 through a coalition of the University of Ontario Institute of Technology, SickKids, Department of Pediatrics, and University of Toronto, to help read in data from multiple sensors taken from neonatal intensive care units (NICU).  The goal is to have Artemis to read in data from multiple physiological instruments like an electrocardiogram (ECG), heart rate, blood oxygen saturation, respiratory states, etc. to find key patterns and relationships in the data streams (data-in-motion) to provide the best care for infants in NICU.  To make Artemis a success, the coalition had to analyze huge amounts of data from a large group of patients.  Artemis had to interface with multiple medical devices, should be scalable to add more medical devices, and store raw physiological data while at the same time de-identifying the data per U.S. and Canadian Health Privacy laws.  From these multiple medical devices new rules could be created by unsupervised machine learning techniques, and through supervising machine learning techniques with medical/clinical derived rules.  The Artemis system has to read in the data in real-time to sort, join, clean, and transform, to evaluate against certain rules and send out an alert or not to medical staff about one of the NICU patients, while at the same time de-identifying the data and storing it into a database for future analysis and tests.

In the test phase, 5 infants were enrolled and in the deployed state 19 infants were enrolled in the study. This study has to take into account, that the cables from all the sensors and the equipment use to collect all the streaming data must not get in the way of the medical/clinical staff when they need to help out the infant. In some cases, when the Artemis system was deployed, some of the sensors were not attached, and thus the Information Management Teams had to work with medical/clinical staff to help train the model on fewer data as well, if they do not have all the ideal sensors needed to send out alerts for certain situations.  Therefore, this system provides a way for medical/clinical staff to have constant data on NICU patients in real time from multiple sensors and allow the machine to alert them when certain markers and key performance indicators are met.

Importance of applying data analytics to data-in-motion

It can be easily seen that analyzing infant NICU data is important.  It is especially important to leverage analytics to the data stream of the key medical sensors needed per infant in the NICU.  What is not easily seen sometimes is how important all the data really is.  Since, in the real-life deployment showed that not all the medical sensors are being used to help provide the model with enough information to be of use to medical/clinical staff (Blount et al., 2010).

Also, the use of data streams in a university setting would allow for a different perspective that could be used in the NICU case study above.  At the University of Miami, data is triaged into a four-tiered system (Tsinoremas et al., n.d.):

  • High-speed storage – for data that is currently being processed, data-in-motion is at its highest (has 300TB of space and costs $2000/TB)
  • Mid-range speed storage – for data that is currently being looked at (costs $600-$700/TB)
  • Deep storage – long-term data storage, data that is looked at every so often, but not regularly, usually old data (costs $300/TB)
  • Archived – data to be stored offline, but it is perfect for data at rest

This tiered system above could be applied to Artemis, such that they could process which of the medical devices should be processed first when resources are limited.  Also, this could be applied different, such that there should be a window of which data is currently available, e.g. a 1-hour long record of NICU stats saved locally, with longer records still accessible, but not stored in vital processing spaces.  Data windows were discussed, but depending on the situation, data windows could be adjusted to provide the best care for the infants (Blount et al., 2010).

Also, the quality of the sensor data must be taken into account.  If more data is needed/preferred to make informed decisions on infant patients in the NICU (Blount et al., 2010), then there should be a focus in collecting, analyzing, high-quality data and the right types of data.  This would lead the designers of Artemis, medical, and clinician staff to think deeply about which data is relevant, and how much data is enough to make the decisions needed to tend to the infants (Katal et al., 2013).

Resources

  • Blount, M., Ebling, M. R., Eklund, J. M., James, A. G., McGregor, C., Percival, N., … & Sow, D. (2010). Real-time analysis for intensive care: development and deployment of the Artemis analytic system.IEEE Engineering in Medicine and Biology Magazine29(2), 110-118.
  • Katal, A., Wazid, M., & Goudar, R. H. (2013, August). Big data: issues, challenges, tools and good practices. InContemporary Computing (IC3), 2013 Sixth International Conference on (pp. 404-409). IEEE.
  • Tsinoremas, N. F., Zysman, J., Mader, C., Kirtma, B., & Blaire, J. (n.d.) Data in motion: A new paradigm in research data lifecycle management. Center for Computational Science: University of Miami.

Business Intelligence: Predictions

According to the Association of Professional Futurists (n.d.), “A professional futurist is a person who studies the future in order to help people understand, anticipate, prepare for and gain advantage from coming changes. It is not the goal of a futurist to predict what will happen in the future. The futurist uses foresight to describe what could happen in the future and, in some cases, what should happen in the future.” In my opinion, I will discuss what the future might hold for Data Mining, Knowledge Management and comprehensive BI program and strategy.

The future of …

  • Data mining:

o    Web structure mining (studying the web structure of web pages) and web usage analysis (studying the usage of web pages) will become more prominent in the future.  Victor and Rex (2016) stated that web mining differs from traditional data mining by scale (web information is much larger in number, making 10M web pages seem like it’s too small), access (web information is mostly public, whereas traditional data could be private), and structure (web pages have unstructured, and semi-structured data, whereas traditional data mining, has some explicit level of structure).  The structure of a website can contain: Page Rank, Page number, Damping factor, Number of pages, out-links, in-links, etc.  Your page is considered an authoritative piece if there are many in-links, or it can be considered a hub if it has many out-links, and this helps define page rank and structure of the website (Victor & Rex, 2016).  But, page rank is too trivial of calculation.  One day we will be able to not only know a page rank of a website, but learn its domain authority, page authority, and domain validity, which will help define how much value a particular site can bring to the person.  If Google were to adopt these measures, we could see

  • Data mining’s link to knowledge management (KM):

o    A move towards the away from KM tools and tool set to seeing knowledge as being embedded into as many processes and people as possible (Ferguson, 2016). KM relies on sharing, and as we move away from tools, processes will be setup to allow this sharing to happen.  Sharing occurs more frequently with an increase in interactive and social environments (Ferguson, 2016).  Thus, internal corporate social media platforms may become the central data warehouse, hosting all kinds of knowledge.  The issue and further research need to go into this, is how do we more people engaged on a new social media platform to eventually enable knowledge sharing. Currently, forums, YouTube, and blogs are inviting, highly inclusive environments that share knowledge, like how to solve a particular issue (evident by YouTube video tutorials).  In my opinion, these social platforms or methods of sharing, show the need for a more social, inclusive, and interactive environment needs to be for knowledge sharing to happen more organically.

o    IBM (2013), shows us a glimpse of how knowledge management from veteran police officers, crime data stored in a crime data warehouse, the power of IBM data mining, can be to identifying criminals.  Mostly criminals commit similar crimes with similar patterns and motives.  The IBM tools augment officer’s knowledge, by narrowing down a list of possible suspects of crime down to about 20 people and ranking them on how likely the suspects committed this new crime.  This has been used in Miami-Dade County, the 7th largest county in the US, and a tool like this will become more widespread with time.

  • Business Intelligence (BI) program and strategy:

o    Potential applications of BI and strategy will go into the health care industry.  Thanks to ObamaCare (not being political here), there will be more data coming in due to an increase in patients having coverage, thus more chances to integrate: hospital data, insurance data, doctor diagnosis, patient care, patient flow, research data, financial data, etc. into a data warehouse to run analytics on the data to create beneficial data-driven decisions (Yeoh, & Popovič, 2016; Topaloglou & Barone, 2015).

o    Potential applications of BI and strategy will affect supply chain management.  The Boeing Dreamliner 787, has outsourced 30% of its parts and components, and that is different to the current Boeing 747 which is only 5% outsourced (Yeoh, & Popovič, 2016).  As more and more companies increase their outsourcing percentages for their product mix, the more crucial is capturing data on fault tolerances on each of those outsourced parts to make sure they are up to regulation standards and provide sufficient reliability, utility, and warranty to the end customer.  This is where tons of money and R&D will be spent on in the next few years.

References

  • Ferguson, J. E. (2016). Inclusive perspectives or in-depth learning? A longitudinal case study of past debates and future directions in knowledge management for development. Journal of Knowledge Management, 20(1).
  • IBM (2013). Miami-Dade Police Department: New patterns offer breakthroughs for cold cases. Smarter Planet Leadership Series.  Retrieved from http://www.ibm.com/smarterplanet/global/files/us__en_us__leadership__miami_dade.pdf
  • Topaloglou, T., & Barone, D. (2015) Lessons from a Hospital Business Intelligence Implementation. Retrieved from http://www.idi.ntnu.no/~krogstie/test/ceur/paper2.pdf
  • Victor, S. P., & Rex, M. M. X. (2016). Analytical Implementation of Web Structure Mining Using Data Analysis in Educational Domain. International Journal of Applied Engineering Research, 11(4), 2552-2556.
  • Yeoh, W., & Popovič, A. (2016). Extending the understanding of critical success factors for implementing business intelligence systems. Journal of the Association for Information Science and Technology, 67(1), 134-147.

Business Intelligence: Data Mining

When you think about business intelligence (BI), the first thing that probably comes to mind is data. However, all of those BI solutions use technology. This post discusses how does the data mining approach and concept flow to BI solutions and the enterprise level of an organization’s information technology (IT) effort.

Data mining is just a subset of the knowledge discovery process (or concept flow of Business Intelligence), where data mining provides the algorithms/math that aid in developing actionable data-driven results (Fayyad, Piatetsky-Shapiro, & Smyth, 1996). It should be noted that success has much to do with the events that lead to the main event as it does with the main event.  Incorporating data mining processes into Business Intelligence, one must understand the business task/question behind the problem, properly process all the required data, analyze the data, evaluate and validate the data while analyzing the data, apply the results, and finally learn from the experience (Ahlemeyer-Stubbe & Coleman, 2014). Conolly and Begg (2014), stated that there are four operations of data mining: predictive modeling, database segmentation, link analysis, and deviation detection.  Fayyad et al. (1996), classifies data mining operations by their outcomes: prediction and descriptive.

It is crucial to understand the business task/question behind the problem you are trying to solve.  The reason why is because some types of business applications are associated with particular operations like marketing strategies use database segmentation (Conolly & Begg, 2014).  However, any of the data mining operations can be implemented for any business application, and many business applications can use multiple operations.  Customer profiling can use database segmentation first and then use predictive modeling next (Conolly & Begg, 2014). By thinking outside of the box about which combination of operations and algorithms to use, rather than using previously used operations and algorithms to help meet the business objectives, it could generate even better results (Minelli, Chambers, & Dhiraj, 2013).

A consolidated list (Ahlemeyer-Stubbe & Coleman, 2014; Berson, Smith, & Thearling 1999; Conolly & Begg, 2014; Fayyad et al., 1996) of the different types of data mining operations, algorithms and purposes are listed below.

  • Prediction – “What could happen?”
    • Classification – data is classified into different predefined classes
      • C4.5
      • Chi-Square Automatic Interaction Detection (CHAID)
      • Support Vector Machines
      • Decision Trees
      • Neural Networks (also called Neural Nets)
      • Naïve Bayes
      • Classification and Regression Trees (CART)
      • Bayesian Network
      • Rough Set Theory
      • AdaBoost
    • Regression (Value Prediction) – data is mapped to a prediction formula
      • Linear Regression
      • Logistic Regression
      • Nonlinear Regression
      • Multiple linear regression
      • Discriminant Analysis
      • Log-Linear Regression
      • Poisson Regression
    • Anomaly Detection (Deviation Detection) – identifies significant changes in the data
      • Statistics (outliers)
  • Descriptive – “What has happened?”
    • Clustering (database segmentation) – identifies a set of categories to describe the data
      • Nearest Neighbor
      • K-Nearest Neighbor
      • Expectation-Maximization (EM)
      • K-means
      • Principle Component Analysis
      • Kolmogorov-Smirnov Test
      • Kohonen Networks
      • Self-Organizing Maps
      • Quartile Range Test
      • Polar Ordination
      • Hierarchical Analysis
    • Association Rule Learning (Link Analysis) – builds a model that describes the data dependencies
      • Apriori
      • Sequential Pattern Analysis
      • Similar Time Sequence
      • PageRank
    • Summarization – smaller description of the data
      • Basic probability
      • Histograms
      • Summary Statistics (max, min, mean, median, mode, variance, ANOVA)
  • Prescriptive – “What should we do?” (an extension of predictive analytics)
    • Optimization
      • Decision Analysis

Finally, Ahlemeyer-Stubbe and Coleman (2014) stated that even though there are a ton of versatile data mining software available that would do any of the abovementioned operations and algorithms; a good data mining software would be deployable across different environments and include tools for data prep and transformation.

References

Business Intelligence: data mining success

This post discusses the necessary steps for data mining success. There are many options out there. this post describe the pros and cons of the steps

For data mining success one must follow a data mining process. There are many processes out there, and here are two:

  • From Fayyad, Piatetsky-Shapiro, and Smyth (1996)
    1. Data -> Selection -> Target Data -> Preprocessing -> Preprocesses Data -> Transformation -> Transformed Data -> Data Mining -> Patterns -> Interpretation/Evaluation -> Knowledge
  • From Padhy, Mishra, and Panigrahi (2012)
    1. Business understanding -> Data understanding -> Data Preparation -> Modeling -> Evaluation -> Deployment

Success has much to do with the events that lead to the main event as it does with the main event.  Thus, what is done to the data before data mining can proceed successfully. Fayyad et al. (1996) address that data mining is just a subset of the knowledge discovery process, where data mining provides the algorithms/math that helps reach the final goal.  Looking at each of the individual processes we can see that they are slightly different, yet the same.  Another key thing to note is that we can move back and forth (i.e. iterations) between the steps in these processes.  These two are supposing that data is being pulled from a knowledge database or data warehouse, where the data should be cleaned (uniformly represented, handling missing data, noise, and errors) and accessible (provided access paths to data).

Pros/Challenges

If the removal of the pre-processing stage or data preparation phase, we will never be able to reduce the high-dimensionality in the data sets (Fayyad et al., 1996).  High dimensionality increases the size of the data, thus increases the need for more processing time, which may not be as advantageous on a real-time data feed into the data mining derived model.  Also, with all this data, you run into the chances that the data model derived through the data mining process will pick up spurious patterns, which will not be easily generalizable or even understandable for descriptive purposes (Fayyad et al., 1996).  Descriptive purposes are data mining for the sake of understanding the data, whereas predictive purposes are for data mining for the sake of predicting the next result of an input of variables from a data source (Fayyad et al., 1996, Padhy et al., 2012).  Thus, to avoid this high-dimensionality problem, we must understand the problem, understand why we have the data we have, what data is needed and reduce the dimensions to the bare essentials.

Another challenge that would come from data mining if we did do the selection, data understanding, or data mining algorithm selection, the step is the issue is overfitting.  Fayyad et al. (1996), defines selection as selecting the key data you need to feed into the model, and selecting the right data mining algorithm which will influence the results.  Understanding the problem will allow you to select the right data dimensions as aforementioned as well as the data mining algorithm (Padhy et al., 2012).  Overfitting is when a data mining algorithm tries to not only derive general patterns in the data but also describes it with noisy data (Fayyad et al., 1996).  Through the selection process, you can pick data with reduced noise to avoid an overfitting problem.  Also, Fayyad et al. (1996) suggest that solutions should include: cross-validation, regularization, and other statistical analysis.  Overfitting issues though can be fixed through understanding what you are looking for before using data mining, will aid in the evaluation/interpretation process (Padhy et al., 2012).

Cons/Opportunities

Variety in big data that changes with time, while applying the same data mined model, will at one point, either be outdated (no longer relevant) or invalid.  This is the case in social media, if we try to read posts without focusing on one type of post, it would be hard to say that one particular data pattern model derived from data mining is valid.  Thus, previously defined patterns are no longer valid as data rapidly change with respect to time (Fayyad et al., 1996).  We would have to solve this, through incrementally modifying, deleting or augmenting the defined patterns in the data mining process, but as data can vary in real-time, in the drop of a hat, and this can be quite hard to do (Fayyad et al., 1996).

Missing data and noisy data is very prevalent in Meteorology; we cannot sample the entire atmosphere at every point at every time.  We send up weather balloons 2-4 times a day at two points in a US state at a time.  We then try to feed that into a model for predictive purposes.  However, we have a bunch of gaps in the data.  What happens if the weather balloon is a dud, and we get no data. Hence, we have missing data.  This is a problem with the data.  How are we supposed to rely on the solution derived through data mining if the data is either missing or noisy? Fayyad et al. (1996) said that missing values are “not designed with discovery in mind”, but we must include statistical strategies to define what these values should be.  One of the ones that meteorologist use is data interpolation.  There are many types of interpolation, revolving simple nearest neighbor ones, to complex Gaussian types.

Resources:

 

 

Big Data Analytics: R

R has proven to be the most effective software tool for analyzing big data. This post will be quick discuss of my evaluation of this tool and its relevance in the big data arena, with respects to text mining.

R is a powerful statistical tool that can aid in data mining.  Thus, it has huge relevance in the big data arena.  Focusing on my project, I have found that R has a text mining package [tm()].

Patal and Donga (2015) and Fayyad, Piatetsky-Shapiro, & Smyth, (1996) say that the main techniques in Data Mining are: anomaly detection (outlier/change/deviation detection), association rule learning (relationships between the variables), clustering (grouping data that are similar to another), classification (taking a known structure to new data), regressions (find a function to describe the data), and summarization (visualizations, reports, dashboards). Whereas, According to Ghosh, Roy, & Bandyopadhyay (2012), the main types of Text Mining techniques are: text categorization (assign text/documents with pre-defined categories), text-clustering (group similar text/documents together), concept mining (discovering concept/logic based ideas), Information retrieval (finding the relevant documents per the query), and information extraction (id key phrases and relationships within the text). Meanwhile, Agrawal and Batra (2013) add: summarization (compressed representation of the input), assessing document similarity (similarities between different documents), document retrieval (id and grabbing the most relevant documents), to the list of text mining techniques.

We use the “library(tm)” to aid in transforming text, stem words, build a term-document matrix, etc. mostly for preprocessing the data (RStudio pubs, n.d.). Based on RStudio pubs (n.d.) some text preprocessing steps and code are as follows:

  • To remove punctuation:

docs <- tm_map(docs, removePunctuation)

  • To remove special characters:

for(j in seq(docs))      {        docs[[j]] <- gsub(“/”, ” “, docs[[j]])        docs[[j]] <- gsub(“@”, ” “, docs[[j]])        docs[[j]] <- gsub(“\\|”, ” “, docs[[j]])     }

  • To remove numbers:

docs <- tm_map(docs, removeNumbers)

  • Convert to lowercase:

docs <- tm_map(docs, tolower)

  • Removing “stopwords”/common words

docs <- tm_map(docs, removeWords, stopwords(“english”))

  • Removing particular words

docs <- tm_map(docs, removeWords, c(“department”, “email”))

  • Combining words that should stay together

for (j in seq(docs)){docs[[j]] <- gsub(“qualitative research”, “QDA”, docs[[j]])docs[[j]] <- gsub(“qualitative studies”, “QDA”, docs[[j]])docs[[j]] <- gsub(“qualitative analysis”, “QDA”, docs[[j]])docs[[j]] <- gsub(“research methods”, “research_methods”, docs[[j]])}

  • Removing coming word endings

library(SnowballC)   docs <- tm_map(docs, stemDocument)

Text mining algorithms could consist of but are not limited to (Zhao, 2013):

  • Summarization:
    • Word clouds use “library (wordcloud)”
    • Word frequencies
  • Regressions
    • Term correlations use “library (ggplot2) use functions findAssocs”
    • Plot word frequencies Term correlations use “library (ggplot2)”
  • Classification models:
    • Decision Tree “library (party)” or “library (rpart)”
  • Association models:
    • Apriori use “library (arules)”
  • Clustering models:
    • K-mean clustering use “library (fpc)”
    • K-medoids clustering use “library(fpc)”
    • Hierarchical clustering use “library(cluster)”
    • Density-based clustering use “library (fpc)”

As we can see, there are current libraries, functions, etc. to help with data preprocessing, data mining, and data visualization when it comes to text mining with R and RStudio.

Resources: