Quant: In-depth Analysis in SPSS

This short analysis attempts to understand the marital happiness level on combined income. It was found that marital happiness levels are depended on a couples’ combined income, but for the happiest couples, they were happy regardless how much money they had. This, quantitative analysis on the sample data, has shown that when the happiness levels are low, there is a higher chance of lower levels of combined income.

Abstract

This short analysis attempts to understand the marital happiness level on combined income.  It was found that marital happiness levels are depended on a couples’ combined income, but for the happiest couples, they were happy regardless how much money they had.  This, quantitative analysis on the sample data, has shown that when the happiness levels are low, there is a higher chance of lower levels of combined income.

Introduction

Mulligan (1973), was one of the first that stated arguments about money was one of the top reasons for divorce between couples.  Factors for financial arguments could stem from: Goals and savings; record keeping; delaying tactics; apparel cost-cutting strategies; controlling expenditures; financial statements; do-it-yourself techniques; and cost cutting techniques (Lawrence, Thomasson, Wozniak, & Prawitz, 1993). Lawrence et al. (1993) exerts that financial arguments are common between families.  However, when does money no longer become an issue?  Does the increase in combined family income affect the marital happiness levels?  This analysis attempts to answer these questions.

Methods

Crosstabulation was conducted to get a descriptive exploration of the data.  Graphical images of box-plots helped show the spread and distribution of combined income per marital happiness.  In this analysis of the data the two alternative hypothesis will be tested:

  • There is a difference between the mean values of combined income per marital happiness levels.
  • There is a dependence between the combined income and marital happiness level

This would lead to finally analyzing the hypothesis introduced in the previous section, one-way analysis of variance and two-way chi-square test was conducted respectively.

Results

Table 1: Case processing summary for analyzing happiness level versus family income.

u6db1f7Table 2: Crosstabulation for analyzing happiness level versus family income (<$21,250).

u6db1f3Table 3: Crosstabulation for analyzing happiness level versus family income for (>$21,250).
u6db1f4

Table 4: Chi-square test for analyzing happiness level versus family income.

u6db1f5

Table 5: Analysis of Variance for analyzing happiness level versus family income.

u6db1f6

u6db1f1.png

Figure 1: Boxplot diagram per happiness level of a marriage versus the family incomes.

u6db1f2.png

Figure 2: Line diagram per happiness level of a marriage versus the mean of the family incomes.

Discussions and Conclusions

There are 1419 participants, and only 38.5% had responded to both their happiness of marriage and family income (Table 1).  What may have contributed to this huge unresponsive rate is that there could have been people who were not married, and thus making the happiness of marriage question not applicable to the participants.  Thus, it is suggested that in the future, there should be an N/A classification in this survey instrument, to see if we can have a higher response rate.  Given that there are still 547 responses, there is other information to be gained from analyzing this data.

As a family unit gains more income, their happiness level increases (Table 2-3).  This can be seen as the dollar value increases, the % within the family income and ranges recorded to midpoint for the very happy category increases as well from the 50% to the 75% level.    The unhappiest couples seem to be earning a combined medium amount of $7500-9000 and at $27500-45000.  Though for marriages that are pretty happy, it’s about stable at 30-40% of respondents at $13750 or more.

The mean values of family income to happiness (Figure 2), shows that on average, happier couples make more money together, but at a closer examination using boxplots (Figure 1), the happiest couples, seem to be happy regardless of how much money they make as the tails of the box plot extend really far from the median.  One interesting feature is that the spread of family combined income is shrinks as happiness decreases (Figure 1).  This could possibly suggest that though money is not a major factor for those couples that are happy, if the couple is unhappy it could be driven by lower combined incomes.

The two-tailed chi-squared test, shows statistical significance between family combined income and marital happiness allowing us to reject the null hypothesis #2, which stated that these two variables were independent of each other (Table 4).  Whereas the analysis of variance doesn’t allow for a rejection of the null hypothesis #1, which states the means are different between the groups of marital happiness level (Table 5).

There could be many reasons for this analysis, thus future work could include analyzing other variables that could help define other factors for marital happiness.  A possible multi-variate analysis may be necessary to see the impact on marital happiness as the dependent variable and combined income as one of many independent variables.

SPSS Code

GET

  FILE=’C:\Users\mkher\Desktop\SAV files\gss.sav’.

DATASET NAME DataSet1 WINDOW=FRONT.

CROSSTABS

  /TABLES=hapmar BY incomdol

  /FORMAT=AVALUE TABLES

  /STATISTICS=CHISQ CORR

  /CELLS=COUNT ROW COLUMN

  /COUNT ROUND CELL.

ONEWAY rincome BY hapmar

  /MISSING ANALYSIS

* Chart Builder.

GGRAPH

  /GRAPHDATASET NAME=”graphdataset” VARIABLES=hapmar incomdol MISSING=LISTWISE REPORTMISSING=NO

  /GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

  SOURCE: s=userSource(id(“graphdataset”))

  DATA: hapmar=col(source(s), name(“hapmar”), unit.category())

  DATA: incomdol=col(source(s), name(“incomdol”))

  DATA: id=col(source(s), name(“$CASENUM”), unit.category())

  GUIDE: axis(dim(1), label(“HAPPINESS OF MARRIAGE”))

  GUIDE: axis(dim(2), label(“Family income; ranges recoded to midpoints”))

  SCALE: cat(dim(1), include(“1”, “2”, “3”))

  SCALE: linear(dim(2), include(0))

  ELEMENT: schema(position(bin.quantile.letter(hapmar*incomdol)), label(id))

END GPL.

* Chart Builder.

GGRAPH

  /GRAPHDATASET NAME=”graphdataset” VARIABLES=hapmar MEAN(incomdol)[name=”MEAN_incomdol”]

    MISSING=LISTWISE REPORTMISSING=NO

  /GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

  SOURCE: s=userSource(id(“graphdataset”))

  DATA: hapmar=col(source(s), name(“hapmar”), unit.category())

  DATA: MEAN_incomdol=col(source(s), name(“MEAN_incomdol”))

  GUIDE: axis(dim(1), label(“HAPPINESS OF MARRIAGE”))

  GUIDE: axis(dim(2), label(“Mean Family income; ranges recoded to midpoints”))

  SCALE: cat(dim(1), include(“1”, “2”, “3”))

  SCALE: linear(dim(2), include(0))

  ELEMENT: line(position(hapmar*MEAN_incomdol), missing.wings())

END GPL.

References

Quant: Chi-Square Test in SPSS

The aim of this analysis is to determine the association strength for the variables agecat and degree as well the major contributing cells through a chi-square analysis. Through the use of standardized residuals, it should aid in determining the cell contributions.

Introduction

The aim of this analysis is to determine the association strength for the variables agecat and degree as well the major contributing cells through a chi-square analysis. Through the use of standardized residuals, it should aid in determining the cell contributions.

Hypothesis

  • Null: There is no basis of difference between the agecat and degree
  • Alternative: There is are real differences between the agecat and degree

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.).  The goal is to look at the relationships between the following variables: agecat (Age category) and degree (Respondent’s highest degree).

To conduct a chi-square analysis, navigate through Analyze > Descriptive Statistics > Crosstabs.

The variable degree was placed in the “Row(s)” box and agecat was placed under “Column(s)” box.  Select “Statistics” button and select “Chi-square” and under the “Nominal” section select “Lambda”. Select the “Cells” button and select “Standardized” under the “Residuals” section. The procedures for this analysis are provided in video tutorial form by Miller (n.d.).  The following output were observed in the next four tables.

Results

Table 1: Case processing summary.

Cases
Valid Missing Total
N Percent N Percent N Percent
Degree * Age category 1411 99.4% 8 0.6% 1419 100.0%

From the total sample size of 1419 participants, 8 cases are reported to be missing, yielding a 99.4% response rate (Table 1).   Examining the cross tabulation, for the age groups 30-39, 40-49, 50-59, and 60-89 the standardized residual is far less than -1.96 or far greater than +1.96 respectively.  Thus, the frequencies between these two differ significantly.  Finally, for the 60-89 age group the standardized residual is less than -1.96, making these two frequencies differ significantly.  Thus, for all these frequencies, SPSS identified that the observed frequencies are far apart from the expected frequencies (Miller, n.d.).  For those significant standardized residuals that are negative is pointing out that the SPSS model is over predicting people of that age group with that respective diploma (or lack thereof).  For those significant standardized residuals that are positive is point out that the SPSS model is under-predicting people of that age group with a lack of a diploma.

Table 2: Degree by Age category crosstabulation.

Age category Total
18-29 30-39 40-49 50-59 60-89
Degree Less than high school Count 42 33 36 20 112 243
Standardized Residual -.1 -2.8 -2.3 -2.7 7.1
High school Count 138 162 154 113 158 725
Standardized Residual .9 .2 -.2 .4 -1.2
Junior college or more Count 68 115 114 78 68 443
Standardized Residual -1.1 1.8 1.9 1.4 -3.7
Total Count 248 310 304 211 338 1411

Deriving the degrees of freedom from Table 2, df = (5-1)*(3-1) is 8.  However, none of the expected counts were less than five because the minimum expected count is 36.3 (Table 3) which is desirable.  The chi-squared value is 96.364 and is significance at the 0.05 level. Thus, the null hypothesis is rejected, and there is a statistically significant association between a person’s age category and diploma level.  This test doesn’t tell us anything about the directionality of the relationship.

Table 3: Chi-Square Tests

Value df Asymptotic Significance (2-sided)
Pearson Chi-Square 96.364a 8 .000
Likelihood Ratio 90.580 8 .000
Linear-by-Linear Association 23.082 1 .000
N of Valid Cases 1411
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 36.34.

Table 4: Directional Measures

Value Asymptotic Standard Errora Approximate Tb Approximate Significance
Nominal by Nominal Lambda Symmetric .029 .013 2.278 .023
Degree Dependent .000 .000 .c .c
Age category Dependent .048 .020 2.278 .023
Goodman and Kruskal tau Degree Dependent .024 .005 .000d
Age category Dependent .019 .004 .000d
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Cannot be computed because the asymptotic standard error equals zero.
d. Based on chi-square approximation

Since there is a statistically significant association between a person’s age category and diploma level, the chi-square test doesn’t show how much these variables are related to each other. The lambda value (when we reject the null hypothesis) is 0.029; there is a 2.9% relationship between the two variables. Thus the relationship has a very weak effect (Table 4). Thus, 2.9% of the variance is accounted for, and there is nothing going on in here.

Conclusions

There is a statistically significant association between a person’s age category and diploma level.  According to the crosstabulation, the SPSS model is significantly over-predicting the number of people with less education than a high school diploma for the age groups of 20-59 as well as those with a college degree for the 60-89 age group.  This difference in the standard residual helped drive a large and statistically significant chi-square value. With a lambda of 0.029, it shows that 2.9% of the variance is accounted for, and there is nothing going on in here.

SPSS Code

CROSSTABS

  /TABLES=ndegree BY agecat

  /FORMAT=AVALUE TABLES

  /STATISTICS=CHISQ CC LAMBDA

  /CELLS=COUNT SRESID

  /COUNT ROUND CELL.

References:

Quant: Linear Regression in SPSS

The aim of this analysis is to look at the relationship between a father’s education level (dependent variable) when you know the mother’s education level (independent variable). The variable names are “paeduc” and “maeduc.” Thus, the hope is to determine the linear regression equation for predicting the father’s education level from the mother’s education.

Introduction

The aim of this analysis is to look at the relationship between a father’s education level (dependent variable) when you know the mother’s education level (independent variable). The variable names are “paeduc” and “maeduc.” Thus, the hope is to determine the linear regression equation for predicting the father’s education level from the mother’s education.

From the SPSS outputs the following questions will be addressed:

  • How much of the total variance have you accounted for with the equation?
  • Based upon your equation, what level of education would you predict for the father when the mother has 16 years of education?

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.).  The goal is to look at the relationships between the following variables: paeduc (HIGHEST YEAR SCHOOL COMPLETED, FATHER) and maeduc (HIGHEST YEAR SCHOOL COMPLETED, MOTHER). To conduct a linear regression analysis navigate through Analyze > Regression > Linear Regression.  The variable paeduc was placed in the “Dependent List” box, and maeduc was placed under “Independent(s)” box.  The procedures for this analysis are provided in video tutorial form by Miller (n.d.).  The following output was observed in the next four tables.

The relationship between paeduc and maeduc are plotted in a scatterplot by using the chart builder.  Code to run the chart builder code is shown in the code section, and the resulting image is shown in the results section.

Results

Table 1: Variables Entered/Removed

Model Variables Entered Variables Removed Method
1 HIGHEST YEAR SCHOOL COMPLETED, MOTHERb . Enter
a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER
b. All requested variables entered.

Table 1, reports that for the linear regression analysis the dependent variable is the highest years of school completed for the father and the independent variable is the highest year of school completed by the mother.  No variables were removed.

Table 2: Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate
1 .639a .408 .407 3.162
a. Predictors: (Constant), HIGHEST YEAR SCHOOL COMPLETED, MOTHER
b. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER

For a linear regression trying to predict the father’s highest year of school completed based on his wife’s highest year of school completed, the correlation is positive with a value of 0.639, which can only 0.408 of the variance explained (Table 2) and 0.582 of the variance is unexplained.  The linear regression formula or line of best fit (Table 4) is: y = 0.76 x + (2.572 years) + e.  The line of best fit essentially explains in equation form the mathematical relationship between two variables and in this case the father’s and mother’s highest education level.  Thus, if the mother has completed her bachelors’ degree (16th year), then this equation would yield (y = 2.572 years + 0.76 (16 years) + e = 14.732 years + e).  The e is the error in this prediction formula, and it exists because of the r2 value is not exactly -1.0 or +1.0.  The ANOVA table (Table 3) describes that this relationship between these two variables is statistically significant at the 0.05 level.

Table 3: ANOVA Table

Model Sum of Squares df Mean Square F Sig.
1 Regression 6231.521 1 6231.521 623.457 .000b
Residual 9045.579 905 9.995
Total 15277.100 906
a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER
b. Predictors: (Constant), HIGHEST YEAR SCHOOL COMPLETED, MOTHER

Table 4: Coefficients

Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 2.572 .367 7.009 .000
HIGHEST YEAR SCHOOL COMPLETED, MOTHER .760 .030 .639 24.969 .000
a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER

The image below (Figure 1), is a scatter plot, which is plotting the highest year of school completed by the mother vs. the father along with the linear regression line (Table 4) and box plot images of each respective distribution.  There are more outliers in the husband’s education level compared to those of the wife’s education level, and the spread of the education level is more concentrated about the median for the husband’s education level.

u4db1f1.png

Figure 1: Highest year of school completed by the mother vs the father scatter plot with regression line and box plot images of each respective distribution.

Conclusion

There is a statistically significant relation between the husband’s and wife’s highest year of education completed.  The line of best-fit formula shows a moderately positive correlation and is defined as y = 0.76 x + (2.572 years) + e; which can only explain 40.8% of the variance, while 58.2% of the variance is unexplained.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

REGRESSION

  /MISSING LISTWISE

  /STATISTICS COEFF OUTS R ANOVA

  /CRITERIA=PIN(.05) POUT(.10)

  /NOORIGIN

  /DEPENDENT paeduc

  /METHOD=ENTER maeduc

  /CASEWISE PLOT(ZRESID) OUTLIERS(3).

STATS REGRESS PLOT YVARS=paeduc XVARS=maeduc

/OPTIONS CATEGORICAL=BARS GROUP=1 BOXPLOTS INDENT=15 YSCALE=75

/FITLINES LINEAR APPLYTO=TOTAL.

References:

Quant: ANOVA and Multiple Comparisons in SPSS

The aim of this analysis is to look at the relationship between the dependent variable of the income level of respondents (rincdol) and the independent variable of their reported level of happiness (happy). This independent variable has at least 3 or more levels within it.

Introduction

The aim of this analysis is to look at the relationship between the dependent variable of the income level of respondents (rincdol) and the independent variable of their reported level of happiness (happy).   This independent variable has at least 3 or more levels within it.

From the SPSS outputs the goal is to:

  • How to use the ANOVA program to determine the overall conclusion. Use of the Bonferroni correction as a post-hoc analysis to determine the relationship of specific levels of happiness to income.

Hypothesis

  • Null: There is no basis of difference between the overall rincdol and happy
  • Alternative: There is are real differences between the overall rincdol and happy
  • Null2: There is no basis of difference between the certain pairs of rincdol and happy
  • Alternative2: There is are real differences between the certain pairs of rincdol and happy

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.).  The goal is to look at the relationships between the following variables: rincdol (Respondent’s income; ranges recoded to midpoints) and happy (General Happiness). To conduct a parametric analysis, navigate to Analyze > Compare Means > One-Way ANOVA.  The variable rincdol was placed in the “Dependent List” box, and happy was placed under “Factor” box.  Select “Post Hoc” and under the “Equal Variances Assumed” select “Bonferroni”.  The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next two tables.

The relationship between rincdol and happy are plotted by using the chart builder.  Code to run the chart builder code is shown in the code section, and the resulting image is shown in the results section.

Results

Table 1: ANOVA

Respondent’s income; ranges recoded to midpoints
Sum of Squares df Mean Square F Sig.
Between Groups 11009722680.000 2 5504861341.000 9.889 .000
Within Groups 499905585000.000 898 556687733.900
Total 510915307700.000 900

Through the ANOVA analysis, Table 1, it shows that the overall ANOVA shows statistical significance, such that the first Null hypothesis is rejected at the 0.05 level. Thus, there is a statistically significant difference in the relationship between the overall rincdol and happy variables.  However, the difference between the means at various levels.

Table 2: Multiple Comparisons

Dependent Variable:   Respondent’s income; ranges recoded to midpoints
Bonferroni
(I) GENERAL HAPPINESS (J) GENERAL HAPPINESS Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval
Lower Bound Upper Bound
VERY HAPPY PRETTY HAPPY 4093.678 1744.832 .058 -91.26 8278.61
NOT TOO HAPPY 12808.643* 2912.527 .000 5823.02 19794.26
PRETTY HAPPY VERY HAPPY -4093.678 1744.832 .058 -8278.61 91.26
NOT TOO HAPPY 8714.965* 2740.045 .005 2143.04 15286.89
NOT TOO HAPPY VERY HAPPY -12808.643* 2912.527 .000 -19794.26 -5823.02
PRETTY HAPPY -8714.965* 2740.045 .005 -15286.89 -2143.04
*. The mean difference is significant at the 0.05 level.

According to Table 2, for the pairings of “Very Happy” and “Pretty Happy” did not disprove the Null2 for that case at the 0.05 level. But, all other pairings “Very Happy” and “Not Too Happy” with “Pretty Happy” and “Not Too Happy” can reject the Null2 hypothesis at the 0.05 level.  Thus, there is a difference when comparing across the three different pairs.

u3db3f1

Figure 1: Graphed means of General Happiness versus incomes.

The relationship between general happiness and income are positively correlated (Figure 1).  That means that a low level of general happiness in a person usually have lower recorded mean incomes and vice versa.  There is no direction or causality that can be made from this analysis.  It is not that high amounts of income cause general happiness, or happy people make more money due to their positivism attitude towards life.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

ONEWAY rincdol BY happy

  /MISSING ANALYSIS

  /POSTHOC=BONFERRONI ALPHA(0.05).

* Chart Builder.

GGRAPH

  /GRAPHDATASET NAME=”graphdataset” VARIABLES=happy MEAN(rincdol)[name=”MEAN_rincdol”]

    MISSING=LISTWISE REPORTMISSING=NO

  /GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

  SOURCE: s=userSource(id(“graphdataset”))

  DATA: happy=col(source(s), name(“happy”), unit.category())

  DATA: MEAN_rincdol=col(source(s), name(“MEAN_rincdol”))

  GUIDE: axis(dim(1), label(“GENERAL HAPPINESS”))

  GUIDE: axis(dim(2), label(“Mean Respondent’s income; ranges recoded to midpoints”))

  SCALE: cat(dim(1), include(“1”, “2”, “3”))

  SCALE: linear(dim(2), include(0))

  ELEMENT: line(position(happy*MEAN_rincdol), missing.wings())

END GPL.

References:

Quant: Group Statistics in SPSS

The aim of this analysis is to make a decision about whether a person is alive or dead ten years after a coronary is reflected in a significant difference in his diastolic blood pressure taken when that event occurred. The variable “DBP58” will be used as a dependent variable and “Vital10” as an independent variable.

Introduction

The aim of this analysis is to make a decision about whether a person is alive or dead ten years after a coronary is reflected in a significant difference in his diastolic blood pressure taken when that event occurred. The variable “DBP58” will be used as a dependent variable and “Vital10” as an independent variable.

From the SPSS outputs the goal is to:

  • Analyze these conditions to determine if there is a significant difference between the DBP levels of those (vital10) who are alive 10 years later compared to those who died within 10 years.

Hypothesis

  • Null: There is no basis of difference between the DBP58 and Vital10
  • Alternative: There is are real differences between the DBP58 and Vital10

Methodology

For this project, the electric.sav file is loaded into SPSS (Electric, n.d.).  The goal is to look at the relationships between the following variables: DBP58 (Average Diastolic Blood Pressure) and Vital10 (Status at Ten Years). To conduct a parametric analysis, navigate to Analyze > Compare Means > Paired-Samples T Test.  The variable DBP58 was placed in the “Test Variables” box, and Vital10 was placed under “grouping variable” box.  Then select the “Define Groups” button and enter 0 for “Group 1” and 1 for “Group 2”.  The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next two tables.

Results

Table 1: Group Statistics

Status at Ten Years N Mean Std. Deviation Std. Error Mean
Average Diast Blood Pressure 58 Alive 178 87.56 11.446 .858
Dead 61 92.38 16.477 2.110

According to the results in Table 1, the mean diastolic blood pressure of those who have passed away ten years later was 5 points higher and had a huge standard deviation.  Thus, those who are alive ten years later have a smaller variation of their diastolic blood pressure.

Table 2: Independent Samples Test

Levene’s Test for Equality of Variances t-test for Equality of Means
F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference
Lower Upper
Average Diast Blood Pressure 58 Equal variances assumed 8.815 .003 -2.515 237 .013 -4.815 1.915 -8.587 -1.043
Equal variances not assumed -2.114 80.735 .038 -4.815 2.277 -9.347 -.284

According to the independent t-test for equality of means, shows that there is no equality in the variance at the 0.05 level, such that when equal variances are not assumed, the null hypothesis could be rejected at the 0.05 level because the significance value is 0.038.  Thus, there is a statistically significant difference between the means of diastolic blood pressure of those who are alive and those who have passed away.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

T-TEST GROUPS=vital10(0 1)

  /MISSING=ANALYSIS

  /VARIABLES=dbp58

  /CRITERIA=CI(.95).

References:

Quant: Paired Sample Statistics in SPSS

The aim of this analysis is to conduct a comparison of productivity under two organizational structures: The data are artificial estimates of productivity with column 1 representing traditional vertical management and column 2 representing other autonomous work teams (ATW). The background is that a company of 100 factory workers had been operating under traditional vertical management and decided to move to ATW. The same employees were involved in both systems having first worked under vertical management and then being converted to ATW.

Introduction

The aim of this analysis is to conduct a comparison of productivity under two organizational structures: The data are artificial estimates of productivity with column 1 representing traditional vertical management and column 2 representing other autonomous work teams (ATW). The background is that a company of 100 factory workers had been operating under traditional vertical management and decided to move to ATW. The same employees were involved in both systems having first worked under vertical management and then being converted to ATW.

From the SPSS outputs the goal is to:

  • Analyze the productivity levels of the 2 management approaches, and decide which is superior.

Hypothesis

  • Null: There is no basis of difference between the prodpre and prodpost
  • Alternative: There is are real differences between the prodpre and prodpost

Methodology

For this project, the atw.sav file is loaded into SPSS (ATW, n.d.).  The goal is to look at the relationships between the following variables: prodpre (productivity level preceding the new process) and prodpost (productivity level following the new process). To conduct a parametric analysis, navigate to Analyze > Compare Means > Paired-Samples T Test.  The variable prodpre was placed in the “Paired Variables” box under “Pair” 1 and “Variable 1”, and prodpost was placed under “Pair” 1 and “Variable 2”.  The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next three tables.

Results

Table 1: Paired Sample Statistics

Mean N Std. Deviation Std. Error Mean
Pair 1 productivity level preceding the new process 76.43 100 16.820 1.682
productivity level following the new process 84.24 100 9.797 .980

Descriptively, productivity on average increased by 8 points, and the standard deviation about the mean decreased by 7 points.  This means that the estimates of productivity under the traditional vertical management are less than and showcase a wider spread than those of the estimates of productivity under the autonomous work teams.  Essentially these distributions tell the story that the workers are getting better productivity estimates with less deviation under autonomous work teams.

Table 2: Paired Samples Correlation

N Correlation Sig.
Pair 1 productivity level preceding the new process & productivity level following the new process 100 .040 .695

Based on Table 2, there is a weak correlation (r = 0.040) between the estimates of productivity under the traditional vertical management and the autonomous work teams.  Although correlation does not imply causation.

Table 3: Paired Samples Test

Paired Differences t df Sig. (2-tailed)
Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference
Lower Upper
Pair 1 productivity level preceding the new process – productivity level following the new process -7.817 19.126 1.913 -11.612 -4.022 -4.087 99 .000

Based on the results from the 2-tailed student t-tests (Table 3), the null hypothesis can be rejected.  There is a significant difference between the two variables prodpre and prodpost at the 0.05 level or less.  The data based on 100 workers (with degrees of freedom of 99) show that there is a significance in the estimates of productivity under the traditional vertical management and the autonomous work teams.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

T-TEST PAIRS=prodpre WITH prodpost (PAIRED)

  /CRITERIA=CI(.9500)

  /MISSING=ANALYSIS.

References:

Quant: Exploring Data with SPSS

The aim of this analysis is to run a distribution analysis on diastolic blood pressure (DBP58), examining the following for individuals who have had no history of cardiovascular heart disease and individuals with a history of cardiovascular heart disease (CHD). The variable that looks at individual history is CHD.

Introduction

The aim of this analysis is to run a distribution analysis on diastolic blood pressure (DBP58), examining the following for individuals who have had no history of cardiovascular heart disease and individuals with a history of cardiovascular heart disease (CHD). The variable that looks at individual history is CHD.

From the SPSS outputs the following questions will be addressed:

  • What can be determined from the measures of skewness and kurtosis about a normal curve? What are the mean and median?
  • Does one seem better than the other to represent the scores?
  • What differences can be seen in the pattern of responses of those with history versus those with no history?
  • What information can be determined from the box plots?

Methodology

For this project, the electric.sav file is loaded into SPSS (Electric, n.d.).  The goal is to look at the relationships between the following variables: DBP58 (Average Diastolic Blood Pressure) and CHD (Incidence of Coronary Heart Disease). To conduct a descriptive analysis, navigate through Analyze > Descriptive Analytics > Explore.  The variable DBP58 was placed in the “Dependent List” box, and CHD was placed on the “Factor List” box.  Then on the Explore dialog box, “Statistics” button was clicked, and in this dialog box “Descriptives” at the 95% “Confidence interval for the mean” is selected along with outliers and percentiles.  Then going back to the on the Explore dialog box, “Plots” button was clicked, and in this dialog box under the “Boxplot” section only “Factor levels together” was selected, under the “Descriptive” section, both options were selected, and the “Spread vs. Level with Levene Test” section, “None” was selected.  The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next four tables and five figures.

Results

Table 1: Case Processing Summary.

Incidence of Coronary Heart Disease Cases
Valid Missing Total
N Percent N Percent N Percent
Average Diast Blood Pressure 58 none 119 99.2% 1 0.8% 120 100.0%
chd 120 100.0% 0 0.0% 120 100.0%

According to Table 1, 99.2% or greater of the data is valid and not missing for when there is a history of Coronary Heart Disease (CHD) and when there isn’t. There is one missing data point in the case with no history of CHD. This data set contains 120 participants.

Table 2: Descriptive Statistics on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

Incidence of Coronary Heart Disease Statistic Std. Error
Average Diast Blood Pressure 58 none Mean 87.66 1.005
95% Confidence Interval for Mean Lower Bound 85.66
Upper Bound 89.65
5% Trimmed Mean 87.31
Median 87.00
Variance 120.312
Std. Deviation 10.969
Minimum 65
Maximum 125
Range 60
Interquartile Range 15
Skewness .566 .222
Kurtosis .671 .440
chd Mean 89.92 1.350
95% Confidence Interval for Mean Lower Bound 87.24
Upper Bound 92.59
5% Trimmed Mean 88.89
Median 87.00
Variance 218.732
Std. Deviation 14.790
Minimum 65
Maximum 160
Range 95
Interquartile Range 18
Skewness 1.406 .221
Kurtosis 3.620 .438

According to Table 2, there is a difference in the mean by +2 points and +0.345 in standard error in Diastolic Blood Pressure with CHD compared to when there isn’t.  The median for both cases of CHD or not are 87, with the mean for patients with CHD 89.92 (slightly skewed) and that can be seen with a skewness of 1.406 and a kurtosis of 3.620.  For the cases without a CHD, the mean blood pressure is 87.66 (showing little to now skewness in the data), as evident by the skewness of 0.566 and kurtosis of 0.671.  Upon further inspection of Figures 1 & 2, the skewness or lack thereof seems to appear to be the result of some outliers. The box plot in Figure 3 confirms these outliers.  The kurtosis values of 0.671 and 3.620 indicate they are Leptokurtic, which means they have higher peaks in their distribution and deviate from a normal distribution.

u2db3f1

Figure 1: Histogram on the Incidents of Coronary Heart Disease = none and the Average Diastolic Blood Pressure.

u2db3f2.png

Figure 2: Histogram on the Incidents of Coronary Heart Disease = chd and the Average Diastolic Blood Pressure.

u2db3f3.png

Figure 3: Box plots on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

Comparing the two histograms in Figures 1 & 2, there is a negative skewness to the data when there is CHD compared to when there isn’t.  The spread between the two histograms increases by about 3.7 points (the standard deviation from the mean) when there is CHD.  This shows that blood pressure in the sample population can vary greatly if there is CHD, whereas blood pressure is a bit more stable in the sample population that doesn’t have CHD.  Looking at the range of these the average diastolic blood pressure, if there is a CHD, then it increases, which is supported by the greater standard deviation number, and can be seen in Figure 3.  In the case with no CHD the interquartile range (which represents the middle 50% of the participants) is smaller than the participants with CHD. Participant 120 was excluded from the interquartile range due to its extreme nature.

Table 3: Percentiles on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

Incidence of Coronary Heart Disease Percentiles
5 10 25 50 75 90 95
Weighted Average (Definition 1) Average Diast Blood Pressure 58 none 71.00 75.00 80.00 87.00 95.00 102.00 105.00
chd 70.05 75.00 80.00 87.00 98.00 109.90 117.95
Tukey’s Hinges Average Diast Blood Pressure 58 none 80.00 87.00 94.50
chd 80.00 87.00 98.00

In Table 3, the percentiles on the incidents of CHD on the average diastolic blood pressure is mapped out.  95 % of all cases exist below 105 (117.95) diastolic blood pressure for no history of CHD (for the history of CHD).  These percentiles show that in the case where there is no CHD, the diastolic blood pressure values are centered more towards the median value of 87, which is supported by the above-mentioned Tables and Figures.

Table 4: Extreme Values on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

Incidence of Coronary Heart Disease Case Number Value
Average Diast Blood Pressure 58 none Highest 1 163 125
2 232 119
3 144 115
4 126 110
5 131 109
Lowest 1 157 65
2 156 65
3 175 68
4 153 68
5 237 69
chd Highest 1 120 160
2 56 133
3 42 125
4 26 121
5 111 120
Lowest 1 73 65
2 34 68
3 101 70
4 33 70
5 7 70a
a. Only a partial list of cases with the value 70 are shown in the table of lower extremes.

Examining the extreme values through Table 4, the top 5 and lowest 5 cases are considered.  In the case were there is no CHD, the lowest diastolic blood pressure value can be seen as 65 which is the same as those with CHD.  However, in the highest diastolic blood pressure value, there is a 35 point greater difference for the highest case with CHD on the highest case without CHD.

  •  Frequency    Stem &  Leaf
  •       .00        6 .
  •      5.00        6 .  55889
  •      4.00        7 .  1144
  •     18.00        7 .  555677777777888899
  •     21.00        8 .  000000000001122223344
  •     21.00        8 .  555556666777777888999
  •     20.00        9 .  00000111111222233334
  •     14.00        9 .  55666777888899
  •      8.00       10 .  00012233
  •      4.00       10 .  5559
  •      1.00       11 .  0
  •      1.00       11 .  5
  •      2.00 Extremes    (>=119)
  •  Stem width:   10
  •  Each leaf:        1 case(s)

Figure 4: Stem and leaf plot on the Incidents of Coronary Heart Disease = none and the Average Diastolic Blood Pressure.

  •  Frequency    Stem &  Leaf
  •       .00        6 .
  •      2.00        6 .  58
  •      9.00        7 .  000012233
  •     14.00        7 .  55555677788899
  •     23.00        8 .  00000000000111233333344
  •     24.00        8 .  555556667777777788999999
  •     11.00        9 .  00001122223
  •     13.00        9 .  6677788888999
  •      5.00       10 .  02333
  •      7.00       10 .  5557789
  •      4.00       11 .  0003
  •      3.00       11 .  578
  •      2.00       12 .  01
  •      1.00       12 .  5
  •      2.00 Extremes    (>=133)
  •  Stem width:   10
  •  Each leaf:        1 case(s)

Figure 5: Stem and leaf plot on the Incidents of Coronary Heart Disease = chd and the Average Diastolic Blood Pressure.

Figures 4 and 5 show more detail than the histogram information by stating the actual frequency to the left of the Stem values as well as stating what is considered to be extreme values.  In the case of CHD, a diastolic blood pressure greater than 133 is considered an outlier and when there is no CHD the extreme values are considered to be a diastolic blood pressure of 119 or more.

Conclusions

There is a difference between the distributions of those participants that have a history of Coronary Heart Disease (CHD) and those that don’t on their average diastolic blood pressure.  This is represented through the range, skewness, and distribution between both groups.  Both groups have similar medians, and lowest values, but vary greatly in the mean, standard deviation and highest values of diastolic blood pressure.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

EXAMINE VARIABLES=dbp58 BY chd

  /PLOT BOXPLOT STEMLEAF HISTOGRAM

  /COMPARE GROUPS

  /PERCENTILES(5,10,25,50,75,90,95) HAVERAGE

  /STATISTICS DESCRIPTIVES EXTREME

  /CINTERVAL 95

  /MISSING LISTWISE

  /NOTOTAL.

References: