Business Intelligence: Data Mining

When you think about business intelligence (BI), the first thing that probably comes to mind is data. However, all of those BI solutions use technology. This post discusses how does the data mining approach and concept flow to BI solutions and the enterprise level of an organization’s information technology (IT) effort.

Advertisements

Data mining is just a subset of the knowledge discovery process (or concept flow of Business Intelligence), where data mining provides the algorithms/math that aid in developing actionable data-driven results (Fayyad, Piatetsky-Shapiro, & Smyth, 1996). It should be noted that success has much to do with the events that lead to the main event as it does with the main event.  Incorporating data mining processes into Business Intelligence, one must understand the business task/question behind the problem, properly process all the required data, analyze the data, evaluate and validate the data while analyzing the data, apply the results, and finally learn from the experience (Ahlemeyer-Stubbe & Coleman, 2014). Conolly and Begg (2014), stated that there are four operations of data mining: predictive modeling, database segmentation, link analysis, and deviation detection.  Fayyad et al. (1996), classifies data mining operations by their outcomes: prediction and descriptive.

It is crucial to understand the business task/question behind the problem you are trying to solve.  The reason why is because some types of business applications are associated with particular operations like marketing strategies use database segmentation (Conolly & Begg, 2014).  However, any of the data mining operations can be implemented for any business application, and many business applications can use multiple operations.  Customer profiling can use database segmentation first and then use predictive modeling next (Conolly & Begg, 2014). By thinking outside of the box about which combination of operations and algorithms to use, rather than using previously used operations and algorithms to help meet the business objectives, it could generate even better results (Minelli, Chambers, & Dhiraj, 2013).

A consolidated list (Ahlemeyer-Stubbe & Coleman, 2014; Berson, Smith, & Thearling 1999; Conolly & Begg, 2014; Fayyad et al., 1996) of the different types of data mining operations, algorithms and purposes are listed below.

  • Prediction – “What could happen?”
    • Classification – data is classified into different predefined classes
      • C4.5
      • Chi-Square Automatic Interaction Detection (CHAID)
      • Support Vector Machines
      • Decision Trees
      • Neural Networks (also called Neural Nets)
      • Naïve Bayes
      • Classification and Regression Trees (CART)
      • Bayesian Network
      • Rough Set Theory
      • AdaBoost
    • Regression (Value Prediction) – data is mapped to a prediction formula
      • Linear Regression
      • Logistic Regression
      • Nonlinear Regression
      • Multiple linear regression
      • Discriminant Analysis
      • Log-Linear Regression
      • Poisson Regression
    • Anomaly Detection (Deviation Detection) – identifies significant changes in the data
      • Statistics (outliers)
  • Descriptive – “What has happened?”
    • Clustering (database segmentation) – identifies a set of categories to describe the data
      • Nearest Neighbor
      • K-Nearest Neighbor
      • Expectation-Maximization (EM)
      • K-means
      • Principle Component Analysis
      • Kolmogorov-Smirnov Test
      • Kohonen Networks
      • Self-Organizing Maps
      • Quartile Range Test
      • Polar Ordination
      • Hierarchical Analysis
    • Association Rule Learning (Link Analysis) – builds a model that describes the data dependencies
      • Apriori
      • Sequential Pattern Analysis
      • Similar Time Sequence
      • PageRank
    • Summarization – smaller description of the data
      • Basic probability
      • Histograms
      • Summary Statistics (max, min, mean, median, mode, variance, ANOVA)
  • Prescriptive – “What should we do?” (an extension of predictive analytics)
    • Optimization
      • Decision Analysis

Finally, Ahlemeyer-Stubbe and Coleman (2014) stated that even though there are a ton of versatile data mining software available that would do any of the abovementioned operations and algorithms; a good data mining software would be deployable across different environments and include tools for data prep and transformation.

References

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s