Quant: Compelling topics

A discussion on what were the most compelling topics learned in the subject of Quantitative Analysis.

Most Compelling Topics

Field (2013) states that both quantitative and qualitative methods are complimentary at best, none competing approaches to solving the world’s problems. Although these methods are quite different from each other. Simply put, quantitative methods are utilized when the research contains variables that are numerical, and qualitative methods are utilized when the research contains variables that are based on language (Field, 2013).  Thus, central to quantitative research and methods is to understand the numerical, ordinal, or categorical dataset and what the data represents. This can be done through either descriptive statistics, where the researcher uses statistics to help describe a data set, or it can be done through inferential statistics, where conclusions can be drawn about the data set (Miller, n.d.).

Field (2013) and Schumacker (2014), defined central tendency as an all-encompassing term to help describe the “center of a frequency distribution” through the commonly used measures mean, median, and mode.  Outliers, missing values, and multiplication of a constant, and adding a constant are factors that affect the central tendency (Schumacker, 2014).  Besides just looking at one central tendency measure, researchers can also analyze the mean and median together to understand how skewed the data is and in which direction.  Heavily skewed distributions would heavily increase the distance between these two values, and if the mean less than the median the distribution is skewed negatively (Field, 2013).  To understand the distribution, better other measures like variance and standard deviations could be used.

Variance and standard deviations are considered as measures of dispersion, where the variance is considered as measures of average dispersion (Field, 2013; Schumacker, 2014).  Variance is a numerical value that describes how the observed data values are spread across the data distribution and how they differ from the mean on average (Huck, 2011; Field, 2013; Schumacker, 2014).  The smaller the variance indicates that the observed data values are close to the mean and vice versa (Field, 2013).

Rarely is every member of the population studied, and instead a sample from that population is randomly taken to represent that population for analysis in quantitative research (Gall, Gall, & Borg 2006). At the end of the day, the insights gained from this type of research should be impersonal, objective, and generalizable.  To generalize the results of the research the insights gained from a sample of data needs to use the correct mathematical procedures for using probabilities and information, statistical inference (Gall et al., 2006).  Gall et al. (2006), stated that statistical inference is what dictates the order of procedures, for instance, a hypothesis and a null hypothesis must be defined before a statistical significance level, which also has to be defined before calculating a z or t statistic value.  Essentially, a statistical inference allows for quantitative researchers to make inferences about a population.  A population, where researchers must remember where that data was generated and collected from during quantitative research process.

Most flaws in research methodology exist because the validity and reliability weren’t established (Gall et al., 2006). Thus, it is important to ensure a valid and reliable assessment instrument.  So, in using any existing survey as an assessment instrument, one should report the instrument’s: development, items, scales, reports on reliability, and reports on validity through past uses (Creswell, 2014; Joyner, 2012).  Permission must be secured for using any instrument and placed in the appendix (Joyner, 2012).  The validity of the assessment instrument is key to drawing meaningful and useful statistical inferences (Creswell, 2014).

Through sampling of a population and using a valid and reliable survey instrument for assessment, attitudes and opinions about a population could be correctly inferred from the sample (Creswell, 2014).  Sometimes, a survey instrument doesn’t fit those in the target group. Thus it would not produce valid nor reliable inferences for the targeted population. One must select a targeted population and determine the size of that stratified population (Creswell, 2014).

Parametric statistics, are inferential and based on random sampling from a distinct population, and that the sample data is making strict inferences about the population’s parameters, thus tests like t-tests, chi-square, f-tests (ANOVA) can be used (Huck, 2011; Schumacker, 2014).  Nonparametric statistics, “assumption-free tests”, is used for tests that are using ranked data like Mann-Whitney U-test, Wilcoxon Signed-Rank test, Kruskal-Wallis H-test, and chi-square (Field, 2013; Huck, 2011).

First, there is a need to define the types of data.  Continuous data is interval/ratio data, and categorical data is nominal/ordinal data.  Modified from Schumacker (2014) with data added from Huck (2011):

Statistic Dependent Variable Independent Variable
Analysis of Variance (ANOVA)
     One way Continuous Categorical
t-Tests
     Single Sample Continuous
     Independent groups Continuous Categorical
     Dependent (paired groups) Continuous Categorical
Chi-square Categorical Categorical
Mann-Whitney U-test Ordinal Ordinal
Wilcoxon Ordinal Ordinal
Kruskal-Wallis H-test Ordinal Ordinal

So, meaningful results get reported and their statistical significance, confidence intervals and effect sizes (Creswell, 2014). If the results from a statistical test have a low probability of occurring by chance (5% or 1% or less) then the statistical test is considered significant (Creswell, 2014; Field, 2014; Huck, 2011Statistical significance test can have the same effect yet result in different values (Field, 2014).  Statistical significance on large samples sizes can be affected by small differences and can show up as significant, while in smaller samples large differences may be deemed insignificant (Field, 2014).  Statistically significant results allow the researcher to reject a null hypothesis but do not test the importance of the observations made (Huck, 2011).  Huck (2011) stated two main factors that could influence whether or not a result is statistically significant is the quality of the research question and research design.

Huck (2011) suggested that after statistical significance is calculated and the research can either reject or fail to reject a null hypothesis, effect size analysis should be conducted.  The effect size allows researchers to measure objectively the magnitude or practical significance of the research findings through looking at the differential impact of the variables (Huck, 2011; Field, 2014).  Field (2014), defines one way of measuring the effect size is through Cohen’s d: d = (Avg(x1) – Avg(x2))/(standard deviation).  If d = 0.2 there is a small effect, d = 0.5 there is a moderate effect, and d = 0.8 or more there is a large effect (Field, 2014; Huck, 2011). Thus, this could be the reason why a statistical test could yield a statistically significant value, but further analysis with effect size could show that those statistically significant results do not explain much of what is happening in the total relationship.

In regression analysis, it should be possible to predict the dependent variable based on the independent variables, depending on two factors: (1) that the productivity assessment tool is valid and reliable (Creswell, 2014) and (2) we have a large enough sample size to conduct our analysis and be able to draw statistical inference of the population based on the sample data which has been collected (Huck, 2011). Assuming these two conditions are met, then regression analysis could be made on the data to create a prediction formula. Regression formulas are useful for summarizing the relationship between the variables in question (Huck, 2011).

When modeling predict the dependent variable based upon the independent variable the regression model with the strongest correlation will be used as it is that regression formula that explains the variance between the variables the best.   However, just because the regression formula can predict some or most of the variance between the variables, it will never imply causation (Field, 2013).  Correlations help define the strength of the regression formula in defining the relationships between the variables, and can vary in value from -1 to +1.  The closer the correlation coefficient is to -1 or +1; it informs the researcher that the regression formula is a good predictor of the variance between the variables.  The closer the correlation coefficient is to zero, indicates that there is hardly any relationship between the variable (Field, 2013; Huck, 2011; Schumacker, 2014).  It should never be forgotten that correlation doesn’t imply causation, but can help determine the percentage of the variances between the variables by the regression formula result, when the correlation value is squared (r2) (Field, 2013).

 

References:

  • Creswell, J. W. (2014) Research design: Qualitative, quantitative and mixed method approaches (4th ed.). California, SAGE Publications, Inc. VitalBook file.
  • Field, A. (2013) Discovering Statistics Using IBM SPSS Statistics (4th ed.). UK: Sage Publications Ltd. VitalBook file.
  • Gall, M. D., Gall, J., & Borg W. (2006). Educational research: An introduction (8th ed.). Pearson Learning Solutions. VitalBook file.
  • Huck, S. W. (2011) Reading Statistics and Research (6th ed.). Pearson Learning Solutions. VitalBook file.
  • Joyner, R. L. (2012) Writing the Winning Thesis or Dissertation: A Step-by-Step Guide (3rd ed.). Corwin. VitalBook file.
  • Miller, R. (n.d.). Week 1: Central tendency [Video file]. Retrieved from http://breeze.careeredonline.com/p9fynztexn6/?launcher=false&fcsContent=true&pbMode=normal
  • Schumacker, R. E. (2014) Learning statistics using R. California, SAGE Publications, Inc, VitalBook file.

Quant: Independent and Dependent Variables

Remember that variables have to be clearly definable and measureable. Remember that variables have more than one values or levels.

Below is a list of examples of a scenario for each of the following sets of variables:

  • One independent variable and one dependent variable
    • Example 1:
      • Independent variable: Demographics of Gender (Male, Female, Other)
      • Dependent variable: management reported job performance level
    • Example 2:
      • Independent variable: Satisfaction level with their job (5 point Likert scale response)
      • Dependent variable: management reported job performance level
    • Example 3:
      • Independent variable: years of service at their company (number of years)
      • Dependent variable: management reported job performance level
  • Two independent variables and one dependent variable
    • Example 1:
      • Independent variable #1: Demographics of Gender (Male, Female, Other)
      • Independent variable #2: years of service at their company (number of years)
      • Dependent variable: management reported job performance level
    • Example 2:
      • Independent variable #1: Satisfaction level with their job (5 point Likert scale response)
      • Independent variable #2: years of service at their company (number of years)
      • Dependent variable: management reported job performance level

Quant: Introduction to SPSS

SPSS is at the mercy of your input. What are variables and in what ways can you enter data into SPSS? Once you have numeric data into SPSS, what steps are required to define the meanings of the numbers for SPSS? (This requires explaining the components of Variable View in SPSS.) Why is it important to SPSS that you define these meanings?

IBM SPSS aids in the entire quantitative analytical process, which aids in gaining insights on your data, to allow for better data-driven decisions (IBM, n.d.).  SPSS allows for the quick statistical practice and analysis of the data, without getting too focused and bogged by the statistical equations (Field, 2013). SPSS allows the end user to graphically tell a story about their data by discovering hidden relationships for pattern analysis through the table, graphs, charts, and maps that are allowing pivoting (IBM, n.d).  This tool also provides high accuracy, flexibility, and advanced statistical procedures which can be made available through the guided user interface or by allowing programmable options such internal command line syntax and external programming interfaces with R, Python, Java, .NET, etc. for automating procedures (IBM, n.d.).  However, Field (2013), warned that software like SPSS, which can automate statistical equations and procedures should not be used without fully understanding the statistical theory.

Variables and how to insert them into SPSS

A variable is a measurable and observed characteristic, attribute, or object which can differ between time, space, entity, person, organization, etc. (Creswell, 2014; Field, 2013). How these variables interact with other variables helps define what type of variable they are.  There are many types of variables such as dependent variables, independent variables, intervening/mediating variables, moderating variables, control variables, confounding variables, and extraneous variables (Creswell, 2014; Field, 2013). Dependent variables measure the outcome variation and are explained and influenced by independent variables (Schumacker, 2014). Thus, the dependent variables depend on the outcomes of the independent variables (Creswell, 2014).   Independent variables which are those that can be manipulated to help explain the dependent variable’s variation (Schumacker, 2014). Thus, the independent variables are the probable cause, influence, or affect the dependent variable (Creswell, 2014).  Intervening/mediating variables stand between the independent and dependent variable as a probable causal link between the two (Creswell, 2014).  Moderating variables are a type of independent variables that influence the direction or strength between the independent and dependent variables (Cresswell, 2014). Control variables are a type of independent variable that is restricted in some way or another to help find possible influences on the dependent variable.  Confounding variables are not measured or observed, but its influences cannot be detected.  Finally, there are extraneous variables are a type of independent variable, which are not controlled in quasi-experimental research and can influence the variation of the dependent variable (Schumacker, 2014)

In SPSS, one could enter in a variable in the data editor through the “Data View” window (see Figure 1) or through the “Variable View” window (see Figure 2).  In the “Data View” data can be entered in the cells below the variable name and new variables could be added by right clicking on the top most cell and selecting “Insert Variable,” though it should be avoided (Field, 2013; Miller, n.d.).  Whereas in the “Variable View” allows the end user to not only add new variables but add defining descriptions and characteristics of the variable (Field, 2013; Miller n.d.).  Every row in “Variable View” is variable and to add a new cell just select the cell below the last variable shown and start typing the variable’s name (Field, 2013).

u1db3f1

Figure 1: SPSS “Data View” on a sample dataset called bodyfat.sav.

u1db3f2

Figure 2: SPSS “Variable View” on a sample dataset called bodyfat.sav.

Data consists of numbers.  Numbers alone do not mean a thing.  The number 3 alone doesn’t mean a thing, however, three apples, three diamonds, 3oC means something. Once numerical data has been collected and entered into SPSS, it must be defined.  It is good practice to define the data in the “Variable View” immediately after collection and population into SPSS, because as time goes on memory can fade, and if the variable is not defined it can easily be forgotten what all those numbers mean.  Thus, defining the meaning to the data through the variable view allows the end user to remember what the data in each column of SPSS is, and tells SPSS how to treat, categorize, analyze, and display the variable. In order to do that the end user would need to enter in the: name of the variable, type of variable (numeric, string, currency, date, Boolean, etc.), width of the variable (number of digits and characters in the cell), decimals (how many decimals are displayed), label (a place holder to write the full name or description of the variable), values (assign numbers for representing groups), missing (if data is missing what value should it have), columns (width of the display column), align (cell data display alignment), measure (nominal, ordinal, or scale), and the variable’s role (input, target, both, split, partition, or none, which is used for regression analysis) (Field, 2013; Miller, n.d.).

References: