## Adv Quant: Logistic Vs Linear Regression

To generalize the results of the research the insights gained from a sample of data needs to use the correct mathematical procedures for using probabilities and information, statistical inference (Gall et al., 2006; Smith, 2015).  Gall et al. (2006), stated that statistical inference is what dictates the order of procedures, for instance, a hypothesis and a null hypothesis must be defined before a statistical significance level, which also has to be defined before calculating a z or t statistic value. Essentially, a statistical inference allows for quantitative researchers to make inferences about a population.  A population, where researchers must remember where that data was generated and collected from during quantitative research process.  The orders of procedures are important to apply statistical inferences to regressions, if not the prediction formula will not be generalizable.

Logistic regression is another flavor of multi-variable regression, where one or more independent variables are continuous or categorical which are used to predict a dichotomous/ binary/ categorical dependent variable (Ahlemeyer-Stubbe, & Coleman, 2014; Field, 2013; Gall, Gall, & Borg, 2006; Huck, 2011).  Logistic regression is an alternative to linear regression, which assumes all variables are continuous (Ahlemeyer-Stubbe, & Coleman, 2014). Both the multi-variable linear regression and logistic regression formula are (Field, 2013; Schumacker, 2014):

Y = a + b11 + b2X2 + …                                                       (1)

The main difference between these two regressions is that the variables in the equation (1) represent different types of dependent (Y) and independent variables (Xi).  These different types of variables may have to undergo a transformation before the regression analysis begins (Field, 2013; Schumacker 2014).  Due to the difference in the types of variables between logistic and linear regression the assumptions on when to use either regression are also different (Table 1).

Table 1: Discusses and summarizes the types of assumptions and variables used in both logistic and regular regression, created from Ahlemeyer-Stubbe & Coleman (2014), Field (2013), Gall et al. (2006), Huck (2011) and Schumacker, (2014).

 Assumptions of Logistic Regression Assumptions for Linear Regression ·         Multicollinearity should be minimized between the independent variables ·         There is no need for linearity between the dependent and independent variables ·         Normality only on the continuous independent variables ·         No need for homogeneity of variance within the categorical variables ·         Error terms a not normally distributed ·         Independent variables don’t have to be continuous ·         There are no missing data (no null values) ·         Variance that is not zero ·         Multicollinearity should be minimized between the multiple independent variables ·         Linearity exists between all variables ·         Additivity (for multi-variable linear regression) ·         Errors in the dependent variable and its predicted values are independent and uncorrelated ·         All variables are continuous ·         Normality on all variables ·         Normality on the error values ·         Homogeneity of variance ·         Homoscedasticity- variance between residuals are constant ·         Variance that is not zero Variable Types of Logistic Regression Variable Types of Linear Regression ·         2 or more Independent variables ·         Independent variables: continuous, dichotomous, binary, or categorical ·         Dependent variable: dichotomous, binary ·         1 or more Independent variables ·         Independent variables: continuous ·         Dependent variables: continuous

References

• Ahlemeyer-Stubbe, Andrea, Shirley Coleman. (2014). A Practical Guide to Data Mining for Business and Industry, 1st Edition. [VitalSource Bookshelf Online].
• Gall, M. D., Gall, J. P., Borg, W. R. (2006). Educational Research: An Introduction, 8th Edition. [VitalSource Bookshelf Online].
• Field, Andy. (2013). Discovering Statistics Using IBM SPSS Statistics, 4th Edition. [VitalSource Bookshelf Online].
• Huck, Schuyler W. (2011). Reading Statistics and Research, 6th Edition. [VitalSource Bookshelf Online].
• Schumacker, Randall E. (2014). Learning Statistics Using R, 1st Edition. [VitalSource Bookshelf Online].