Fraud detection in the health care industry using analytics

Fraud is deception, fraud detection is really needed, because as fraud detection algorithms are improving, the rate of fraud is increasing (Minelli, Chambers, &, Dhiraj, 2013). Hadoop and the HFlame distribution have to be used to help identify fraudulent data in other companies like banking in near-real-time (Lublinsky, Smith, & Yakubovich, 2013).

Data mining has allowed for fraud detection via multi-attribute monitoring, where it tries to find hidden anomalies by identifying hidden patterns through the use of class description and class discrimination (Brookshear & Brylow, 2014; Minellli et al., 2013). Class Descriptions identify patterns that define a group of data, and class discrimination identifies patterns that divide groups of data (Brookshear & Brylow, 2014). As data flows in, data is monitored through validity check and detection rules and gives them a score, such that if the validity and detection score surpasses a threshold, that data point is flagged as potentially suspicious (Minelli et al., 2013).

This is a form of outlier data mining analysis, where data that doesn’t fit any of the above groups of data that has been described and discriminated can be used to identify fraudulent data (Brookshear & Brylow, 2014; Connolly & Begg, 2014). Minelli et al. (2013), stated that using historical data to build up the validity check and detection rules with real-time data can help identify outliers in near-real time. However, what about predicting fraud?  In the future, companies will be using Hadoop’s machine learning capability paired with its fraud detection algorithms to provided predictive modeling of fraud events (Lublinsky, Smith, & Yakubovich, 2013).

A process mining framework for the detection of healthcare fraud and abuse case study (Yang & Hwang, 2006)

Fraud exists in processing health insurance claims because there are more opportunities to commit fraud because there are more channels of communication: service providers, insurance agencies, and patients. Any one of these three people can commit fraud, and the highest chance of fraud occurs where service providers can do unnecessary procedures putting patients at risk. Thus this case study provided the framework on how to conduct automated fraud detection. The study collected data from 2543 gynecology patients from 2001-2002 from a hospital, where they filtered out noisy data, identified activities based on medical expertise, identified fraud in about 906.

Before data mining and machine learning, the process was heavily reliant on medical professional with subject matter expertise to detect fraud, which was costly for multiple resources.  Also, machine learning is not subject to human and manual error that is common with humans.  Machine learning algorithms for fraud detection relies on clinical pathways, which are defined as the right people giving the right care services in the right order, with the aim at the reduction of waste and implementing best practices.  Any deviation from this that is abnormal can be flagged by the machine learning algorithm.


  • Brookshear, G., & Brylow, D. (2014). Computer Science: An Overview, (12th). Pearson Learning Solutions. VitalBook file.
  • Connolly, T., Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, (6th). Pearson Learning Solutions. VitalBook file.
  • Lublinsky, B., Smith, K., & Yakubovich, A. (2013). Professional Hadoop Solutions. Wrox. VitalBook file.
  • Minelli, M., Chambers, M., &, Dhiraj, A. (2013). Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. John Wiley & Sons P&T. VitalBook file.
  • Yang, W. S., & Hwang, S. Y. (2006). A process-mining framework for the detection of healthcare fraud and abuse.Expert Systems with Applications31(1), 56-68.