Quant: Exploring Data with SPSS

Introduction

The aim of this analysis is to run a distribution analysis on diastolic blood pressure (DBP58), examining the following for individuals who have had no history of cardiovascular heart disease and individuals with a history of cardiovascular heart disease (CHD). The variable that looks at individual history is CHD.

From the SPSS outputs the following questions will be addressed:

  • What can be determined from the measures of skewness and kurtosis about a normal curve? What are the mean and median?
  • Does one seem better than the other to represent the scores?
  • What differences can be seen in the pattern of responses of those with history versus those with no history?
  • What information can be determined from the box plots?

Methodology

For this project, the electric.sav file is loaded into SPSS (Electric, n.d.).  The goal is to look at the relationships between the following variables: DBP58 (Average Diastolic Blood Pressure) and CHD (Incidence of Coronary Heart Disease). To conduct a descriptive analysis, navigate through Analyze > Descriptive Analytics > Explore.  The variable DBP58 was placed in the “Dependent List” box, and CHD was placed on the “Factor List” box.  Then on the Explore dialog box, “Statistics” button was clicked, and in this dialog box “Descriptives” at the 95% “Confidence interval for the mean” is selected along with outliers and percentiles.  Then going back to the on the Explore dialog box, “Plots” button was clicked, and in this dialog box under the “Boxplot” section only “Factor levels together” was selected, under the “Descriptive” section, both options were selected, and the “Spread vs. Level with Levene Test” section, “None” was selected.  The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next four tables and five figures.

Results

Table 1: Case Processing Summary.

Incidence of Coronary Heart Disease Cases
Valid Missing Total
N Percent N Percent N Percent
Average Diast Blood Pressure 58 none 119 99.2% 1 0.8% 120 100.0%
chd 120 100.0% 0 0.0% 120 100.0%

According to Table 1, 99.2% or greater of the data is valid and not missing for when there is a history of Coronary Heart Disease (CHD) and when there isn’t. There is one missing data point in the case with no history of CHD. This data set contains 120 participants.

Table 2: Descriptive Statistics on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

Incidence of Coronary Heart Disease Statistic Std. Error
Average Diast Blood Pressure 58 none Mean 87.66 1.005
95% Confidence Interval for Mean Lower Bound 85.66
Upper Bound 89.65
5% Trimmed Mean 87.31
Median 87.00
Variance 120.312
Std. Deviation 10.969
Minimum 65
Maximum 125
Range 60
Interquartile Range 15
Skewness .566 .222
Kurtosis .671 .440
chd Mean 89.92 1.350
95% Confidence Interval for Mean Lower Bound 87.24
Upper Bound 92.59
5% Trimmed Mean 88.89
Median 87.00
Variance 218.732
Std. Deviation 14.790
Minimum 65
Maximum 160
Range 95
Interquartile Range 18
Skewness 1.406 .221
Kurtosis 3.620 .438

According to Table 2, there is a difference in the mean by +2 points and +0.345 in standard error in Diastolic Blood Pressure with CHD compared to when there isn’t.  The median for both cases of CHD or not are 87, with the mean for patients with CHD 89.92 (slightly skewed) and that can be seen with a skewness of 1.406 and a kurtosis of 3.620.  For the cases without a CHD, the mean blood pressure is 87.66 (showing little to now skewness in the data), as evident by the skewness of 0.566 and kurtosis of 0.671.  Upon further inspection of Figures 1 & 2, the skewness or lack thereof seems to appear to be the result of some outliers. The box plot in Figure 3 confirms these outliers.  The kurtosis values of 0.671 and 3.620 indicate they are Leptokurtic, which means they have higher peaks in their distribution and deviate from a normal distribution.

u2db3f1

Figure 1: Histogram on the Incidents of Coronary Heart Disease = none and the Average Diastolic Blood Pressure.

u2db3f2.png

Figure 2: Histogram on the Incidents of Coronary Heart Disease = chd and the Average Diastolic Blood Pressure.

u2db3f3.png

Figure 3: Box plots on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

Comparing the two histograms in Figures 1 & 2, there is a negative skewness to the data when there is CHD compared to when there isn’t.  The spread between the two histograms increases by about 3.7 points (the standard deviation from the mean) when there is CHD.  This shows that blood pressure in the sample population can vary greatly if there is CHD, whereas blood pressure is a bit more stable in the sample population that doesn’t have CHD.  Looking at the range of these the average diastolic blood pressure, if there is a CHD, then it increases, which is supported by the greater standard deviation number, and can be seen in Figure 3.  In the case with no CHD the interquartile range (which represents the middle 50% of the participants) is smaller than the participants with CHD. Participant 120 was excluded from the interquartile range due to its extreme nature.

Table 3: Percentiles on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

Incidence of Coronary Heart Disease Percentiles
5 10 25 50 75 90 95
Weighted Average (Definition 1) Average Diast Blood Pressure 58 none 71.00 75.00 80.00 87.00 95.00 102.00 105.00
chd 70.05 75.00 80.00 87.00 98.00 109.90 117.95
Tukey’s Hinges Average Diast Blood Pressure 58 none 80.00 87.00 94.50
chd 80.00 87.00 98.00

In Table 3, the percentiles on the incidents of CHD on the average diastolic blood pressure is mapped out.  95 % of all cases exist below 105 (117.95) diastolic blood pressure for no history of CHD (for the history of CHD).  These percentiles show that in the case where there is no CHD, the diastolic blood pressure values are centered more towards the median value of 87, which is supported by the above-mentioned Tables and Figures.

Table 4: Extreme Values on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

Incidence of Coronary Heart Disease Case Number Value
Average Diast Blood Pressure 58 none Highest 1 163 125
2 232 119
3 144 115
4 126 110
5 131 109
Lowest 1 157 65
2 156 65
3 175 68
4 153 68
5 237 69
chd Highest 1 120 160
2 56 133
3 42 125
4 26 121
5 111 120
Lowest 1 73 65
2 34 68
3 101 70
4 33 70
5 7 70a
a. Only a partial list of cases with the value 70 are shown in the table of lower extremes.

Examining the extreme values through Table 4, the top 5 and lowest 5 cases are considered.  In the case were there is no CHD, the lowest diastolic blood pressure value can be seen as 65 which is the same as those with CHD.  However, in the highest diastolic blood pressure value, there is a 35 point greater difference for the highest case with CHD on the highest case without CHD.

  •  Frequency    Stem &  Leaf
  •       .00        6 .
  •      5.00        6 .  55889
  •      4.00        7 .  1144
  •     18.00        7 .  555677777777888899
  •     21.00        8 .  000000000001122223344
  •     21.00        8 .  555556666777777888999
  •     20.00        9 .  00000111111222233334
  •     14.00        9 .  55666777888899
  •      8.00       10 .  00012233
  •      4.00       10 .  5559
  •      1.00       11 .  0
  •      1.00       11 .  5
  •      2.00 Extremes    (>=119)
  •  Stem width:   10
  •  Each leaf:        1 case(s)

Figure 4: Stem and leaf plot on the Incidents of Coronary Heart Disease = none and the Average Diastolic Blood Pressure.

  •  Frequency    Stem &  Leaf
  •       .00        6 .
  •      2.00        6 .  58
  •      9.00        7 .  000012233
  •     14.00        7 .  55555677788899
  •     23.00        8 .  00000000000111233333344
  •     24.00        8 .  555556667777777788999999
  •     11.00        9 .  00001122223
  •     13.00        9 .  6677788888999
  •      5.00       10 .  02333
  •      7.00       10 .  5557789
  •      4.00       11 .  0003
  •      3.00       11 .  578
  •      2.00       12 .  01
  •      1.00       12 .  5
  •      2.00 Extremes    (>=133)
  •  Stem width:   10
  •  Each leaf:        1 case(s)

Figure 5: Stem and leaf plot on the Incidents of Coronary Heart Disease = chd and the Average Diastolic Blood Pressure.

Figures 4 and 5 show more detail than the histogram information by stating the actual frequency to the left of the Stem values as well as stating what is considered to be extreme values.  In the case of CHD, a diastolic blood pressure greater than 133 is considered an outlier and when there is no CHD the extreme values are considered to be a diastolic blood pressure of 119 or more.

Conclusions

There is a difference between the distributions of those participants that have a history of Coronary Heart Disease (CHD) and those that don’t on their average diastolic blood pressure.  This is represented through the range, skewness, and distribution between both groups.  Both groups have similar medians, and lowest values, but vary greatly in the mean, standard deviation and highest values of diastolic blood pressure.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

EXAMINE VARIABLES=dbp58 BY chd

  /PLOT BOXPLOT STEMLEAF HISTOGRAM

  /COMPARE GROUPS

  /PERCENTILES(5,10,25,50,75,90,95) HAVERAGE

  /STATISTICS DESCRIPTIVES EXTREME

  /CINTERVAL 95

  /MISSING LISTWISE

  /NOTOTAL.

References:

Quant: Crosstabs in SPSS

Introduction

The aim of this analysis is to answer the question, if someone was rich, would they continue or stop working on their highest degree earned, gender, and job satisfaction.

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.).  The goal is to look at the relationships between the following variables: richwork (being wealthy), sex (demographics of gender), satjob (satisfaction level with the job), and degree (education degree level).   The variable richwork is the dependent variable and the other three variables are considered independent variables for this analysis. To conduct a crosstabs analysis, navigate through Analyze > Descriptive Analytics > Crosstabs.  The variable richwork was placed in the “Row(s)” box, and the other three variables were placed in the “Column(s)” box.  Then on the crosstabs dialog box, “Cells” button was clicked, and under the “Counts” section “Observed” was selected and all three boxes were seleceted under the “Percentages” section. The procedures for this analysis are provided in video tutorial form by Miller (n.d.).  The following output was observed in the next four tables.

Results

Table 1: Cases Processing Summary.

Cases
Valid Missing Total
N Percent N Percent N Percent
IF RICH, CONTINUE OR STOP WORKING * Respondent’s highest degree 625 44.0% 794 56.0% 1419 100.0%
IF RICH, CONTINUE OR STOP WORKING * Respondent’s sex 628 44.3% 791 55.7% 1419 100.0%
IF RICH, CONTINUE OR STOP WORKING * JOB OR HOUSEWORK 624 44.0% 795 56.0% 1419 100.0%

According to Table 1, about 44% (~625) of all cases are valid in all three scenarios and about 56% (~793) had missing data, from a total of 1419 respondents.

Table 2: If rich do people continue or stop working with respondent’s highest degree cross tabulation.

Respondent’s highest degree Total
Less than HS High school Junior college Bachelor Graduate
IF RICH, CONTINUE OR STOP WORKING CONTINUE WORKING Count 52 210 39 84 36 421
% within IF RICH, CONTINUE OR STOP WORKING 12.4% 49.9% 9.3% 20.0% 8.6% 100.0%
% within Respondent’s highest degree 69.3% 64.6% 81.3% 67.2% 69.2% 67.4%
% of Total 8.3% 33.6% 6.2% 13.4% 5.8% 67.4%
STOP WORKING Count 23 115 9 41 16 204
% within IF RICH, CONTINUE OR STOP WORKING 11.3% 56.4% 4.4% 20.1% 7.8% 100.0%
% within Respondent’s highest degree 30.7% 35.4% 18.8% 32.8% 30.8% 32.6%
% of Total 3.7% 18.4% 1.4% 6.6% 2.6% 32.6%
Total Count 75 325 48 125 52 625
% within IF RICH, CONTINUE OR STOP WORKING 12.0% 52.0% 7.7% 20.0% 8.3% 100.0%
% within Respondent’s highest degree 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%
% of Total 12.0% 52.0% 7.7% 20.0% 8.3% 100.0%

According to Table 2, with further analysis on whether or not people would continue or stop working, 67.4% would stay, and 32.6% would stop working.  In our data about 12% have less than a high school diploma, 52% have a high school diploma, 7.7% have a gone to junior college, 20% have a bachelor degree and 8.3% have a graduate degree. With further analysis with respect to whether or not people would continue or stop working with respect to the respondent’s highest degree earned, 56.4% of respondents who have only a high school diploma would choose to leave work if they were rich making them the biggest demographic to leave in this “what if” scenario.  Finally, 81.3% of those with a junior college degree would stay at their job if they were rich, making them the biggest demographic to stay in this “what if” scenario. Those with a high school diploma, bachelor degree or graduate degree were approximately 65-69% more likely to continue working if they were rich.

Table 3: If rich do people continue or stop working with respondent’s gender cross tabulation.

Respondent’s sex Total
Male Female
IF RICH, CONTINUE OR STOP WORKING CONTINUE WORKING Count 214 209 423
% within IF RICH, CONTINUE OR STOP WORKING 50.6% 49.4% 100.0%
% within Respondent’s sex 69.3% 65.5% 67.4%
% of Total 34.1% 33.3% 67.4%
STOP WORKING Count 95 110 205
% within IF RICH, CONTINUE OR STOP WORKING 46.3% 53.7% 100.0%
% within Respondent’s sex 30.7% 34.5% 32.6%
% of Total 15.1% 17.5% 32.6%
Total Count 309 319 628
% within IF RICH, CONTINUE OR STOP WORKING 49.2% 50.8% 100.0%
% within Respondent’s sex 100.0% 100.0% 100.0%
% of Total 49.2% 50.8% 100.0%

In our sample data set about 49.2% were male and 50.8% were female, according to Table 3. With further analysis on whether or not people would continue or stop working on the respondent’s gender, 34.5% of women and 30.7% of men would choose to leave work if they were rich.  Gender doesn’t seem to be as strong of an indicator to help determine if a respondent were more likely to continue or stop working if they were rich in this “what if” scenario.

Table 4: If rich would people continue or stop working with respondent’s job satisfaction cross tabulation.

JOB OR HOUSEWORK Total
VERY SATISFIED MOD. SATISFIED A LITTLE DISSAT VERY DISSATISFIED
IF RICH, CONTINUE OR STOP WORKING CONTINUE WORKING Count 199 172 36 14 421
% within IF RICH, CONTINUE OR STOP WORKING 47.3% 40.9% 8.6% 3.3% 100.0%
% within JOB OR HOUSEWORK 71.8% 64.9% 60.0% 63.6% 67.5%
% of Total 31.9% 27.6% 5.8% 2.2% 67.5%
STOP WORKING Count 78 93 24 8 203
% within IF RICH, CONTINUE OR STOP WORKING 38.4% 45.8% 11.8% 3.9% 100.0%
% within JOB OR HOUSEWORK 28.2% 35.1% 40.0% 36.4% 32.5%
% of Total 12.5% 14.9% 3.8% 1.3% 32.5%
Total Count 277 265 60 22 624
% within IF RICH, CONTINUE OR STOP WORKING 44.4% 42.5% 9.6% 3.5% 100.0%
% within JOB OR HOUSEWORK 100.0% 100.0% 100.0% 100.0% 100.0%
% of Total 44.4% 42.5% 9.6% 3.5% 100.0%

In our sample data set about 49.2% were male and 50.8% were female, according to Table 3. With further analysis on whether or not people would continue or stop working on the respondent’s gender, 34.5% of women and 30.7% of menFinally, in Table 4, about 44.4% of respondents are very satisfied at work, 42.5% of respondents are moderately satisfied at work, 3.8% of respondents are moderately dissatisfied at work, and 1.3% of respondents are very dissatisfied at work. With further analysis on whether or not people would continue or stop working on the respondent’s job satisfaction level, 40% of respondents who are moderately dissatisfied would choose to leave work if they were rich making them the biggest demographic to leave in this “what if” scenario. In fact, if the respondents were anything but very satisfied with their job, they had an approximately 7-12% chance increase of wanting to leave their jobs if not rich.  This illustrates that 71.8% of those who are very satisfied with their jobs would stay at their job if they were rich, making them the biggest demographic to stay in this “what if” scenario.

Conclusions

Overall, this analysis has shown that to answer the question, if someone was rich, would they continue or stop working on their highest degree earned, and job satisfaction may have a contributing factor to the respondent’s decision in this “what if” scenario.  However, gender may not play an important role in answering this question.

Would choose to leave work if they were rich.  Gender doesn’t seem to be as strong of an indicator to help determine if a respondent were more likely to continue or stop working if they were rich in this “what if” scenario.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

CROSSTABS

  /TABLES=richwork BY degree sex satjob

  /FORMAT=AVALUE TABLES

  /CELLS=COUNT ROW COLUMN TOTAL

  /COUNT ROUND CELL.

References:

Quant: Understanding Variance

If a researcher were to look at a measure of job performance resulting from 2 different manufacturing processes and found that the mean performance of process A was 82.5, and the mean performance of process B was 78.5, they could not automatically assume that process A will consistently outperform process B.  The reason the researchers cannot come to a conclusion until an analysis of variance done to that data.  There could be variance between the types of the statement of work that is uniquely different and are required between process A and process B (within-group variance), and there could be variances between the groups of people conducting the statement of work (between group variance).  These two types of variances will feed into the F-statistic result which would allow the researcher to state then whether or not they can reject the null hypothesis that the means between both mean performances are the same.

Quant: Variances

Variance is considered as measures of average dispersion (Field, 2013; Schumacker, 2014).  Variance is a numerical value that describes how the observed data values are spread across the data distribution and how they differ from the mean on average (Huck, 2011; Field, 2013; Schumacker, 2014).  The smaller the variance indicates that the observed data values are close to the mean and vice versa (Field, 2013). What happens when researchers want to study if the difference between two means from two groups of data is statistically significant from each other? Researchers could use ANOVA, which is an analysis of variances that test whether or not to reject the null hypothesis of the mean of one group is equal to the mean of another group (Huck, 2011; Schumacker, 2014).  ANOVAs usually test categorical independent variables (groups) and continuous dependent variables (Creswell, 2014).  One of the results of a one-way analysis of variance presents in a table the variance between groups and within groups (Huck, 2011).  Schumacker (2014), explained that the variance between groups indicates the variation between the overall grand mean of the groups, while variance within the groups indicates the variance within the means of the groups.  The variances between groups have a degree of freedom equal to the number of groups analyzed – 1, whereas the variance within the groups has a degree of freedom equal to the number of data points within each group – 1 – the number of groups (Huck, 2011).  Information from within and between the groups are used to calculate the F-statistic to establish statistical significance which can allow the researcher to reject or fail to reject their null hypothesis (Field, 2013; Huck, 2011; Schumacker, 2014).

References

  • Creswell, J. W. (2014) Research design: Qualitative, quantitative and mixed method approaches (4th ed.). California, SAGE Publications, Inc. VitalBook file.
  • Field, A. (2013) Discovering Statistics Using IBM SPSS Statistics (4th ed.). UK: Sage Publications Ltd. VitalBook file.
  • Huck, S. W. (2011) Reading Statistics and Research (6th ed.). Pearson Learning Solutions. VitalBook file.
  • Schumacker, R. E. (2014) Learning statistics using R. California, SAGE Publications, Inc, VitalBook file.

Quant: Independent and Dependent Variables

Below is a list of examples of a scenario for each of the following sets of variables:

  • One independent variable and one dependent variable
    • Example 1:
      • Independent variable: Demographics of Gender (Male, Female, Other)
      • Dependent variable: management reported job performance level
    • Example 2:
      • Independent variable: Satisfaction level with their job (5 point Likert scale response)
      • Dependent variable: management reported job performance level
    • Example 3:
      • Independent variable: years of service at their company (number of years)
      • Dependent variable: management reported job performance level
  • Two independent variables and one dependent variable
    • Example 1:
      • Independent variable #1: Demographics of Gender (Male, Female, Other)
      • Independent variable #2: years of service at their company (number of years)
      • Dependent variable: management reported job performance level
    • Example 2:
      • Independent variable #1: Satisfaction level with their job (5 point Likert scale response)
      • Independent variable #2: years of service at their company (number of years)
      • Dependent variable: management reported job performance level

Quant: Getting Lost in the Numbers

It is easy to get lost in numbers when you do quantitative research.
These are suggestions that can help keep the focus on people and organizations when you are dealing with numbers representing them.

In quantitative research, data that is collected is numerical in nature. Rarely is every member of the population studied, and instead a sample from that population is randomly taken to represent that population for analysis in quantitative research (Gall, Gall, & Borg 2006). At the end of the day, the insights gained from this type of research should be impersonal, objective, and generalizable.  To generalize the results of the research the insights gained from a sample of data needs to use the correct mathematical procedures for using probabilities and information, statistical inference (Gall et al., 2006).  Gall et al. (2006), stated that statistical inference is what dictates the order of procedures, for instance, a hypothesis and a null hypothesis must be defined before a statistical significance level, which also has to be defined before calculating a z or t statistic value.

Essentially, a statistical inference allows for quantitative researchers to make inferences about a population.  A population, where researchers must remember where that data was generated and collected from during quantitative research process.  However, it is easy to get lost in the numbers during quantitative research, thus here is a list of some of the ways to keep the focus on the people and organizations when research deal with the numbers that represent their population: To design a quantitative research project, researchers must understand the purpose and rationale of their own research designs and their research methods (Creswell, 2014).  Knowing the purpose and rationale can help the development of a research question(s) and hypothesis.  With a clear research question and hypothesis can a researcher to design and review their data collection from people, organizations, or instruments.  It is when focusing on the methods section that researchers can keep their focus on the people and organizations by identifying the population, consideration of a stratified population before sampling, sampling design and procedures, selection process for the individuals, which variables to study (their name, how they relate to the research question, and collection description) (Creswell, 2014).

  • The numerical data used in the quantitative research was generated and collected from people, a social group, an organizational entity, or an instrument. The numerical value alone does not have any meaning nor value to the research. But, when the numerical value is paired with contextual information, then it provides researchers a wealth of information to conduct their statistical analysis on the data (Ahlemeyer-Stubbe, & Coleman, 2014; Miller, n.d.a.).
  • Remember each data point, row or column represents a person, group, or thing with all its features and bugs. It would be wise to create a metadata file that describes the data points variables to help keep the focus on the people and organizations.  In SPSS, the metadata section is called the “Variable View”, and each person is represented as an entity or row of data in the “Data View” (Field, 2013; Miller, n.d.b.).
  • Data sets are never neutral and theory-free data repositories but require researchers to interpret that data through their personal lenses (Crawford, Miltner, & Gray, 2014). One must gather and analyze data ethically to avoid social and legal concerns. Thus, the researcher must be aware of how their analysis of the data can be used to cause harm to others or help facilitate discriminate against disenfranchised groups of people (Robinson, 2015).

References:

  • Ahlemeyer-Stubbe, A., & Coleman S. (2014). A practical guide to data mining for business and industry. UK, Wiley-Blackwell. VitalBook file.
  • Crawford, K., Miltner, K., & Gray, M. L. (2014). Critiquing Big Data : Politics , Ethics , Epistemology Special Section Introduction. International Journal of Communication, 8, 1663–1672.
  • Creswell, J. W. (2014) Research design: Qualitative, quantitative and mixed method approaches (4th ed.). California, SAGE Publications, Inc. VitalBook file.
  • Field, A. (2013) Discovering Statistics Using IBM SPSS Statistics (4th ed.). UK: Sage Publications Ltd. VitalBook file.
  • Gall, M. D., Gall, J., & Borg W. (2006). Educational research: An introduction (8th ed.). Pearson Learning Solutions. VitalBook file.
  • Miller, R. (n.d.a.). Week 1: Central tendency [Video file]. Retrieved from http://breeze.careeredonline.com/p9fynztexn6/?launcher=false&fcsContent=true&pbMode=normal
  • Miller, R. (n.d.b.). Week 2: All about SPSS. [Video file]. Retrieved from http://breeze.careeredonline.com/p99kywtldbw/?launcher=false&fcsContent=true&pbMode=normal
  • Robinson, S. C. (2015). The good, the bad, and the ugly: Applying Rawlsian ethics in data mining marketing. Journal of Mass Media Ethics, 30(1), 19–30. http://doi.org/10.1080/08900523.2014.985297

Quant: Introduction to SPSS

IBM SPSS aids in the entire quantitative analytical process, which aids in gaining insights on your data, to allow for better data-driven decisions (IBM, n.d.).  SPSS allows for the quick statistical practice and analysis of the data, without getting too focused and bogged by the statistical equations (Field, 2013). SPSS allows the end user to graphically tell a story about their data by discovering hidden relationships for pattern analysis through the table, graphs, charts, and maps that are allowing pivoting (IBM, n.d).  This tool also provides high accuracy, flexibility, and advanced statistical procedures which can be made available through the guided user interface or by allowing programmable options such internal command line syntax and external programming interfaces with R, Python, Java, .NET, etc. for automating procedures (IBM, n.d.).  However, Field (2013), warned that software like SPSS, which can automate statistical equations and procedures should not be used without fully understanding the statistical theory.

Variables and how to insert them into SPSS

A variable is a measurable and observed characteristic, attribute, or object which can differ between time, space, entity, person, organization, etc. (Creswell, 2014; Field, 2013). How these variables interact with other variables helps define what type of variable they are.  There are many types of variables such as dependent variables, independent variables, intervening/mediating variables, moderating variables, control variables, confounding variables, and extraneous variables (Creswell, 2014; Field, 2013). Dependent variables measure the outcome variation and are explained and influenced by independent variables (Schumacker, 2014). Thus, the dependent variables depend on the outcomes of the independent variables (Creswell, 2014).   Independent variables which are those that can be manipulated to help explain the dependent variable’s variation (Schumacker, 2014). Thus, the independent variables are the probable cause, influence, or affect the dependent variable (Creswell, 2014).  Intervening/mediating variables stand between the independent and dependent variable as a probable causal link between the two (Creswell, 2014).  Moderating variables are a type of independent variables that influence the direction or strength between the independent and dependent variables (Cresswell, 2014). Control variables are a type of independent variable that is restricted in some way or another to help find possible influences on the dependent variable.  Confounding variables are not measured or observed, but its influences cannot be detected.  Finally, there are extraneous variables are a type of independent variable, which are not controlled in quasi-experimental research and can influence the variation of the dependent variable (Schumacker, 2014)

In SPSS, one could enter in a variable in the data editor through the “Data View” window (see Figure 1) or through the “Variable View” window (see Figure 2).  In the “Data View” data can be entered in the cells below the variable name and new variables could be added by right clicking on the top most cell and selecting “Insert Variable,” though it should be avoided (Field, 2013; Miller, n.d.).  Whereas in the “Variable View” allows the end user to not only add new variables but add defining descriptions and characteristics of the variable (Field, 2013; Miller n.d.).  Every row in “Variable View” is variable and to add a new cell just select the cell below the last variable shown and start typing the variable’s name (Field, 2013).

u1db3f1

Figure 1: SPSS “Data View” on a sample dataset called bodyfat.sav.

u1db3f2

Figure 2: SPSS “Variable View” on a sample dataset called bodyfat.sav.

Data consists of numbers.  Numbers alone do not mean a thing.  The number 3 alone doesn’t mean a thing, however, three apples, three diamonds, 3oC means something. Once numerical data has been collected and entered into SPSS, it must be defined.  It is good practice to define the data in the “Variable View” immediately after collection and population into SPSS, because as time goes on memory can fade, and if the variable is not defined it can easily be forgotten what all those numbers mean.  Thus, defining the meaning to the data through the variable view allows the end user to remember what the data in each column of SPSS is, and tells SPSS how to treat, categorize, analyze, and display the variable. In order to do that the end user would need to enter in the: name of the variable, type of variable (numeric, string, currency, date, Boolean, etc.), width of the variable (number of digits and characters in the cell), decimals (how many decimals are displayed), label (a place holder to write the full name or description of the variable), values (assign numbers for representing groups), missing (if data is missing what value should it have), columns (width of the display column), align (cell data display alignment), measure (nominal, ordinal, or scale), and the variable’s role (input, target, both, split, partition, or none, which is used for regression analysis) (Field, 2013; Miller, n.d.).

References: