Adv Quant: Compelling Topics

Compelling topics summary/definitions

  • Supervised machine learning algorithms: is a model that needs training and testing data set. However it does need to validate its model on some predetermined output value (Ahlemeyer-Stubbe & Coleman, 2014, Conolly & Begg, 2014).
  • Unsupervised machine learning algorithms: is a model that needs training and testing data set, but unlike supervised learning, it doesn’t need to validate its model on some predetermined output value (Ahlemeyer-Stubbe & Coleman, 2014, Conolly & Begg, 2014). Therefore, unsupervised learning tries to find the natural relationships in the input data (Ahlemeyer-Stubbe & Coleman, 2014).
  • General Least Squares Model (GLM): is the line of best fit, for linear regressions modeling along with its corresponding correlations (Smith, 2015). There are five assumptions to a linear regression model: additivity, linearity, independent errors, homoscedasticity, and normally distributed errors.
  • Overfitting: is stuffing a regression model with so many variables that have little contributional weight to help predict the dependent variable (Field, 2013; Vandekerckhove, Matzke, & Wagenmakers, 2014). Thus, to avoid the over-fitting problem, the use of parsimony is important in big data analytics.
  • Parsimony: is describing a dependent variable with the fewest independent variables as possible (Field, 2013; Huck, 2013; Smith, 2015). The concept of parsimony could be attributed to Occam’s Razor, which states “plurality out never be posited without necessity” (Duignan, 2015).  Vandekerckhove et al. (2014) describe parsimony as a way of removing the noise from the signal to create better predictive regression models.
  • Hierarchical Regression: When the researcher builds a multivariate regression model, they build it in stages, as they tend to add known independent variables first, and add newer independent variables in order to avoid overfitting in a technique called hierarchical regression (Austin, Goel & van Walraven, 2001; Field, 2013; Huck 2013).
  • Logistic Regression: multi-variable regression, where one or more independent variables are continuous or categorical which are used to predict a dichotomous/ binary/ categorical dependent variable (Ahlemeyer-Stubbe, & Coleman, 2014; Field, 2013; Gall, Gall, & Borg, 2006; Huck, 2011).
  • Nearest Neighbor Methods: K-nearest neighbor (i.e. K =5) is when a data point is clustered into a group, by having 5 of the nearest neighbors vote on that data point, and it is particularly useful if the data is a binary or categorical (Berson, Smith, & Thearling, 1999).
  • Classification Trees: aid in data abstraction and finding patterns in an intuitive way (Ahlemeyer-Stubbe & Coleman, 2014; Brookshear & Brylow, 2014; Conolly & Begg, 2014) and aid the decision-making process by mapping out all the paths, solutions, or options available to the decision maker to decide upon.
  • Bayesian Analysis: can be reduced to a conditional probability that aims to take into account prior knowledge, but updates itself when new data becomes available (Hubbard, 2010; Smith, 2015; Spiegelhalter & Rice, 2009; Yudkowsky, 2003).
  • Discriminate Analysis: how should data be best separated into several groups based on several independent variables that create the largest separation of the prediction (Ahlemeyer-Stubbe, & Coleman, 2014; Field, 2013).
  • Ensemble Models: can perform better than a single classifier, since they are created as a combination of classifiers that have a weight attached to them to properly classify new data points (Bauer & Kohavi, 1999; Dietterich, 2000), through techniques like Bagging and Boosting. Boosting procedures help reduce both bias and variance of the different methods, and bagging procedures reduce just the variance of the different methods (Bauer & Kohavi, 1999; Liaw & Wiener, 2002).



  • Ahlemeyer-Stubbe, Andrea, Shirley Coleman. (2014). A Practical Guide to Data Mining for Business and Industry, 1st Edition. [VitalSource Bookshelf Online].
  • Austin, P. C., Goel, V., & van Walraven, C. (2001). An introduction to multilevel regression models. Canadian Journal of Public Health92(2), 150.
  • Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning,36(1-2), 105-139.
  • Berson, A. Smith, S. & Thearling K. (1999). Building Data Mining Applications for CRM. McGraw-Hill. Retrieved from
  • Brookshear, G., & Brylow, D. (2014). Computer Science: An Overview, 12th Edition. [VitalSource Bookshelf Online].
  • Connolly, T., & Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, 6th Edition. [VitalSource Bookshelf Online].
  • Dietterich, T. G. (2000). Ensemble methods in machine learning. International workshop on multiple classifier systems (pp. 1-15). Springer Berlin Heidelberg.
  • Duignan, B. (2015). Occam’s razor. Encyclopaedia Britannica. Retrieved from
  • Field, Andy. (2013). Discovering Statistics Using IBM SPSS Statistics, 4th Edition. [VitalSource Bookshelf Online].
  • Gall, M. D., Gall, J. P., Borg, W. R. (2006). Educational Research: An Introduction, 8th Edition. [VitalSource Bookshelf Online].
  • Hubbard, D. W. (2010). How to measure anything: Finding the values of “intangibles” in business. (2nd e.d.) New Jersey, John Wiley & Sons, Inc.
  • Huck, Schuyler W. (2011). Reading Statistics and Research, 6th Edition. [VitalSource Bookshelf Online].
  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22.
  • Smith, M. (2015). Statistical analysis handbook. Retrieved from
  • Spiegelhalter, D. & Rice, K. (2009) Bayesian statistics. Retrieved from
  • Vandekerckhove, J., Matzke, D., & Wagenmakers, E. J. (2014). Model comparison and the principle of parsimony.
  • Yudkowsky, E.S. (2003). An intuitive explanation of Bayesian reasoning. Retrieved from

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s