Quant: In-depth Analysis in SPSS

This short analysis attempts to understand the marital happiness level on combined income. It was found that marital happiness levels are depended on a couples’ combined income, but for the happiest couples, they were happy regardless how much money they had. This, quantitative analysis on the sample data, has shown that when the happiness levels are low, there is a higher chance of lower levels of combined income.

Abstract

This short analysis attempts to understand the marital happiness level on combined income.  It was found that marital happiness levels are depended on a couples’ combined income, but for the happiest couples, they were happy regardless how much money they had.  This, quantitative analysis on the sample data, has shown that when the happiness levels are low, there is a higher chance of lower levels of combined income.

Introduction

Mulligan (1973), was one of the first that stated arguments about money was one of the top reasons for divorce between couples.  Factors for financial arguments could stem from: Goals and savings; record keeping; delaying tactics; apparel cost-cutting strategies; controlling expenditures; financial statements; do-it-yourself techniques; and cost cutting techniques (Lawrence, Thomasson, Wozniak, & Prawitz, 1993). Lawrence et al. (1993) exerts that financial arguments are common between families.  However, when does money no longer become an issue?  Does the increase in combined family income affect the marital happiness levels?  This analysis attempts to answer these questions.

Methods

Crosstabulation was conducted to get a descriptive exploration of the data.  Graphical images of box-plots helped show the spread and distribution of combined income per marital happiness.  In this analysis of the data the two alternative hypothesis will be tested:

  • There is a difference between the mean values of combined income per marital happiness levels.
  • There is a dependence between the combined income and marital happiness level

This would lead to finally analyzing the hypothesis introduced in the previous section, one-way analysis of variance and two-way chi-square test was conducted respectively.

Results

Table 1: Case processing summary for analyzing happiness level versus family income.

u6db1f7Table 2: Crosstabulation for analyzing happiness level versus family income (<$21,250).

u6db1f3Table 3: Crosstabulation for analyzing happiness level versus family income for (>$21,250).
u6db1f4

Table 4: Chi-square test for analyzing happiness level versus family income.

u6db1f5

Table 5: Analysis of Variance for analyzing happiness level versus family income.

u6db1f6

u6db1f1.png

Figure 1: Boxplot diagram per happiness level of a marriage versus the family incomes.

u6db1f2.png

Figure 2: Line diagram per happiness level of a marriage versus the mean of the family incomes.

Discussions and Conclusions

There are 1419 participants, and only 38.5% had responded to both their happiness of marriage and family income (Table 1).  What may have contributed to this huge unresponsive rate is that there could have been people who were not married, and thus making the happiness of marriage question not applicable to the participants.  Thus, it is suggested that in the future, there should be an N/A classification in this survey instrument, to see if we can have a higher response rate.  Given that there are still 547 responses, there is other information to be gained from analyzing this data.

As a family unit gains more income, their happiness level increases (Table 2-3).  This can be seen as the dollar value increases, the % within the family income and ranges recorded to midpoint for the very happy category increases as well from the 50% to the 75% level.    The unhappiest couples seem to be earning a combined medium amount of $7500-9000 and at $27500-45000.  Though for marriages that are pretty happy, it’s about stable at 30-40% of respondents at $13750 or more.

The mean values of family income to happiness (Figure 2), shows that on average, happier couples make more money together, but at a closer examination using boxplots (Figure 1), the happiest couples, seem to be happy regardless of how much money they make as the tails of the box plot extend really far from the median.  One interesting feature is that the spread of family combined income is shrinks as happiness decreases (Figure 1).  This could possibly suggest that though money is not a major factor for those couples that are happy, if the couple is unhappy it could be driven by lower combined incomes.

The two-tailed chi-squared test, shows statistical significance between family combined income and marital happiness allowing us to reject the null hypothesis #2, which stated that these two variables were independent of each other (Table 4).  Whereas the analysis of variance doesn’t allow for a rejection of the null hypothesis #1, which states the means are different between the groups of marital happiness level (Table 5).

There could be many reasons for this analysis, thus future work could include analyzing other variables that could help define other factors for marital happiness.  A possible multi-variate analysis may be necessary to see the impact on marital happiness as the dependent variable and combined income as one of many independent variables.

SPSS Code

GET

  FILE=’C:\Users\mkher\Desktop\SAV files\gss.sav’.

DATASET NAME DataSet1 WINDOW=FRONT.

CROSSTABS

  /TABLES=hapmar BY incomdol

  /FORMAT=AVALUE TABLES

  /STATISTICS=CHISQ CORR

  /CELLS=COUNT ROW COLUMN

  /COUNT ROUND CELL.

ONEWAY rincome BY hapmar

  /MISSING ANALYSIS

* Chart Builder.

GGRAPH

  /GRAPHDATASET NAME=”graphdataset” VARIABLES=hapmar incomdol MISSING=LISTWISE REPORTMISSING=NO

  /GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

  SOURCE: s=userSource(id(“graphdataset”))

  DATA: hapmar=col(source(s), name(“hapmar”), unit.category())

  DATA: incomdol=col(source(s), name(“incomdol”))

  DATA: id=col(source(s), name(“$CASENUM”), unit.category())

  GUIDE: axis(dim(1), label(“HAPPINESS OF MARRIAGE”))

  GUIDE: axis(dim(2), label(“Family income; ranges recoded to midpoints”))

  SCALE: cat(dim(1), include(“1”, “2”, “3”))

  SCALE: linear(dim(2), include(0))

  ELEMENT: schema(position(bin.quantile.letter(hapmar*incomdol)), label(id))

END GPL.

* Chart Builder.

GGRAPH

  /GRAPHDATASET NAME=”graphdataset” VARIABLES=hapmar MEAN(incomdol)[name=”MEAN_incomdol”]

    MISSING=LISTWISE REPORTMISSING=NO

  /GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

  SOURCE: s=userSource(id(“graphdataset”))

  DATA: hapmar=col(source(s), name(“hapmar”), unit.category())

  DATA: MEAN_incomdol=col(source(s), name(“MEAN_incomdol”))

  GUIDE: axis(dim(1), label(“HAPPINESS OF MARRIAGE”))

  GUIDE: axis(dim(2), label(“Mean Family income; ranges recoded to midpoints”))

  SCALE: cat(dim(1), include(“1”, “2”, “3”))

  SCALE: linear(dim(2), include(0))

  ELEMENT: line(position(hapmar*MEAN_incomdol), missing.wings())

END GPL.

References

Quant: Linear Regression in SPSS

The aim of this analysis is to look at the relationship between a father’s education level (dependent variable) when you know the mother’s education level (independent variable). The variable names are “paeduc” and “maeduc.” Thus, the hope is to determine the linear regression equation for predicting the father’s education level from the mother’s education.

Introduction

The aim of this analysis is to look at the relationship between a father’s education level (dependent variable) when you know the mother’s education level (independent variable). The variable names are “paeduc” and “maeduc.” Thus, the hope is to determine the linear regression equation for predicting the father’s education level from the mother’s education.

From the SPSS outputs the following questions will be addressed:

  • How much of the total variance have you accounted for with the equation?
  • Based upon your equation, what level of education would you predict for the father when the mother has 16 years of education?

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.).  The goal is to look at the relationships between the following variables: paeduc (HIGHEST YEAR SCHOOL COMPLETED, FATHER) and maeduc (HIGHEST YEAR SCHOOL COMPLETED, MOTHER). To conduct a linear regression analysis navigate through Analyze > Regression > Linear Regression.  The variable paeduc was placed in the “Dependent List” box, and maeduc was placed under “Independent(s)” box.  The procedures for this analysis are provided in video tutorial form by Miller (n.d.).  The following output was observed in the next four tables.

The relationship between paeduc and maeduc are plotted in a scatterplot by using the chart builder.  Code to run the chart builder code is shown in the code section, and the resulting image is shown in the results section.

Results

Table 1: Variables Entered/Removed

Model Variables Entered Variables Removed Method
1 HIGHEST YEAR SCHOOL COMPLETED, MOTHERb . Enter
a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER
b. All requested variables entered.

Table 1, reports that for the linear regression analysis the dependent variable is the highest years of school completed for the father and the independent variable is the highest year of school completed by the mother.  No variables were removed.

Table 2: Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate
1 .639a .408 .407 3.162
a. Predictors: (Constant), HIGHEST YEAR SCHOOL COMPLETED, MOTHER
b. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER

For a linear regression trying to predict the father’s highest year of school completed based on his wife’s highest year of school completed, the correlation is positive with a value of 0.639, which can only 0.408 of the variance explained (Table 2) and 0.582 of the variance is unexplained.  The linear regression formula or line of best fit (Table 4) is: y = 0.76 x + (2.572 years) + e.  The line of best fit essentially explains in equation form the mathematical relationship between two variables and in this case the father’s and mother’s highest education level.  Thus, if the mother has completed her bachelors’ degree (16th year), then this equation would yield (y = 2.572 years + 0.76 (16 years) + e = 14.732 years + e).  The e is the error in this prediction formula, and it exists because of the r2 value is not exactly -1.0 or +1.0.  The ANOVA table (Table 3) describes that this relationship between these two variables is statistically significant at the 0.05 level.

Table 3: ANOVA Table

Model Sum of Squares df Mean Square F Sig.
1 Regression 6231.521 1 6231.521 623.457 .000b
Residual 9045.579 905 9.995
Total 15277.100 906
a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER
b. Predictors: (Constant), HIGHEST YEAR SCHOOL COMPLETED, MOTHER

Table 4: Coefficients

Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 2.572 .367 7.009 .000
HIGHEST YEAR SCHOOL COMPLETED, MOTHER .760 .030 .639 24.969 .000
a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER

The image below (Figure 1), is a scatter plot, which is plotting the highest year of school completed by the mother vs. the father along with the linear regression line (Table 4) and box plot images of each respective distribution.  There are more outliers in the husband’s education level compared to those of the wife’s education level, and the spread of the education level is more concentrated about the median for the husband’s education level.

u4db1f1.png

Figure 1: Highest year of school completed by the mother vs the father scatter plot with regression line and box plot images of each respective distribution.

Conclusion

There is a statistically significant relation between the husband’s and wife’s highest year of education completed.  The line of best-fit formula shows a moderately positive correlation and is defined as y = 0.76 x + (2.572 years) + e; which can only explain 40.8% of the variance, while 58.2% of the variance is unexplained.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

REGRESSION

  /MISSING LISTWISE

  /STATISTICS COEFF OUTS R ANOVA

  /CRITERIA=PIN(.05) POUT(.10)

  /NOORIGIN

  /DEPENDENT paeduc

  /METHOD=ENTER maeduc

  /CASEWISE PLOT(ZRESID) OUTLIERS(3).

STATS REGRESS PLOT YVARS=paeduc XVARS=maeduc

/OPTIONS CATEGORICAL=BARS GROUP=1 BOXPLOTS INDENT=15 YSCALE=75

/FITLINES LINEAR APPLYTO=TOTAL.

References:

Quant: ANOVA and Multiple Comparisons in SPSS

The aim of this analysis is to look at the relationship between the dependent variable of the income level of respondents (rincdol) and the independent variable of their reported level of happiness (happy). This independent variable has at least 3 or more levels within it.

Introduction

The aim of this analysis is to look at the relationship between the dependent variable of the income level of respondents (rincdol) and the independent variable of their reported level of happiness (happy).   This independent variable has at least 3 or more levels within it.

From the SPSS outputs the goal is to:

  • How to use the ANOVA program to determine the overall conclusion. Use of the Bonferroni correction as a post-hoc analysis to determine the relationship of specific levels of happiness to income.

Hypothesis

  • Null: There is no basis of difference between the overall rincdol and happy
  • Alternative: There is are real differences between the overall rincdol and happy
  • Null2: There is no basis of difference between the certain pairs of rincdol and happy
  • Alternative2: There is are real differences between the certain pairs of rincdol and happy

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.).  The goal is to look at the relationships between the following variables: rincdol (Respondent’s income; ranges recoded to midpoints) and happy (General Happiness). To conduct a parametric analysis, navigate to Analyze > Compare Means > One-Way ANOVA.  The variable rincdol was placed in the “Dependent List” box, and happy was placed under “Factor” box.  Select “Post Hoc” and under the “Equal Variances Assumed” select “Bonferroni”.  The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next two tables.

The relationship between rincdol and happy are plotted by using the chart builder.  Code to run the chart builder code is shown in the code section, and the resulting image is shown in the results section.

Results

Table 1: ANOVA

Respondent’s income; ranges recoded to midpoints
Sum of Squares df Mean Square F Sig.
Between Groups 11009722680.000 2 5504861341.000 9.889 .000
Within Groups 499905585000.000 898 556687733.900
Total 510915307700.000 900

Through the ANOVA analysis, Table 1, it shows that the overall ANOVA shows statistical significance, such that the first Null hypothesis is rejected at the 0.05 level. Thus, there is a statistically significant difference in the relationship between the overall rincdol and happy variables.  However, the difference between the means at various levels.

Table 2: Multiple Comparisons

Dependent Variable:   Respondent’s income; ranges recoded to midpoints
Bonferroni
(I) GENERAL HAPPINESS (J) GENERAL HAPPINESS Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval
Lower Bound Upper Bound
VERY HAPPY PRETTY HAPPY 4093.678 1744.832 .058 -91.26 8278.61
NOT TOO HAPPY 12808.643* 2912.527 .000 5823.02 19794.26
PRETTY HAPPY VERY HAPPY -4093.678 1744.832 .058 -8278.61 91.26
NOT TOO HAPPY 8714.965* 2740.045 .005 2143.04 15286.89
NOT TOO HAPPY VERY HAPPY -12808.643* 2912.527 .000 -19794.26 -5823.02
PRETTY HAPPY -8714.965* 2740.045 .005 -15286.89 -2143.04
*. The mean difference is significant at the 0.05 level.

According to Table 2, for the pairings of “Very Happy” and “Pretty Happy” did not disprove the Null2 for that case at the 0.05 level. But, all other pairings “Very Happy” and “Not Too Happy” with “Pretty Happy” and “Not Too Happy” can reject the Null2 hypothesis at the 0.05 level.  Thus, there is a difference when comparing across the three different pairs.

u3db3f1

Figure 1: Graphed means of General Happiness versus incomes.

The relationship between general happiness and income are positively correlated (Figure 1).  That means that a low level of general happiness in a person usually have lower recorded mean incomes and vice versa.  There is no direction or causality that can be made from this analysis.  It is not that high amounts of income cause general happiness, or happy people make more money due to their positivism attitude towards life.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

ONEWAY rincdol BY happy

  /MISSING ANALYSIS

  /POSTHOC=BONFERRONI ALPHA(0.05).

* Chart Builder.

GGRAPH

  /GRAPHDATASET NAME=”graphdataset” VARIABLES=happy MEAN(rincdol)[name=”MEAN_rincdol”]

    MISSING=LISTWISE REPORTMISSING=NO

  /GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

  SOURCE: s=userSource(id(“graphdataset”))

  DATA: happy=col(source(s), name(“happy”), unit.category())

  DATA: MEAN_rincdol=col(source(s), name(“MEAN_rincdol”))

  GUIDE: axis(dim(1), label(“GENERAL HAPPINESS”))

  GUIDE: axis(dim(2), label(“Mean Respondent’s income; ranges recoded to midpoints”))

  SCALE: cat(dim(1), include(“1”, “2”, “3”))

  SCALE: linear(dim(2), include(0))

  ELEMENT: line(position(happy*MEAN_rincdol), missing.wings())

END GPL.

References:

Quant: Parametric and Non-Parametric Stats

There are numerous times when the information collected from a real organization will not conform to the requirements of a parametric analysis. That is, a practitioner would not be able to analyze the data with a t-test or F-test (ANOVA). Presume that a young professional came to you and said he or she had read about tests—such as the Chi-Square, the Mann-Whitney U test, the Wilcoxon Signed-Rank test, and Kruskal-Wallis one-way analysis of variance—and wanted to know when you would use each and why each would be used instead of the t-tests and ANOVA.

Parametric statistics is inferential and based on random sampling from a well-defined population, and that the sample data is making strict inferences about the population’s parameters. Thus tests like t-tests, chi-square, f-tests (ANOVA) can be used (Huck, 2011; Schumacker, 2014).  Nonparametric statistics, “assumption-free tests”, is used for tests that are using ranked data like Mann-Whitney U-test, Wilcoxon Signed-Rank test, Kruskal-Wallis H-test, and chi-square (Field, 2013; Huck, 2011).

First, there is a need to define the types of data.  Continuous data is interval/ratio data, and categorical data is nominal/ordinal data.  Modified from Schumacker (2014) with data added from Huck (2011):

Statistic Dependent Variable Independent Variable
Analysis of Variance (ANOVA)
     One way Continuous Categorical
t-Tests
     Single Sample Continuous
     Independent groups Continuous Categorical
     Dependent (paired groups) Continuous Categorical
Chi-square Categorical Categorical
Mann-Whitney U-test Ordinal Ordinal
Wilcoxon Ordinal Ordinal
Kruskal-Wallis H-test Ordinal Ordinal

ANOVAs (or F-tests) are used to analyze the differences in a group of three or more means, through studying the variation between the groups, and tests the null hypothesis to see if the means between the groups are equal (Huck, 2011). Student t-tests, or t-tests, test as a null hypothesis that the mean of a population has some specified number and is used when the sample size is relatively small compared to the population size (Field, 2013; Huck, 2011; Schumacker, 2014).  The test assumes a normal distribution (Huck, 2011). With large sample sizes, t-test/values are the same as z-tests/values, the same can happen with chi-square, as t and chi-square are distributions with samples size in their function (Schumacker, 2014).  In other words, at large sample sizes the t-distribution and chi-square distribution begin to look like a normal curve.  Chi-square is related to the variance of a sample, and the chi-square tests are used for testing the null hypothesis, which is the sample mean is part of a normal distribution (Schumacker, 2014).  Chi-square tests are so versatile it can be used as a parametric and non-parametric test (Field, 2013; Huck, 2011; Schumacker, 2014).

The Mann-Whiteney U-test and Wilcox signed-rank test are both equivalent, since they are the non-parametric equivalent of the t-tests and the samples don’t even have to be of the same sample length (Field, 2013).

The nonparametric Mann-Whitney U-test can be substituted for a t-test when the normal distribution cannot be assumed and was designed for two independent samples that do not have repeated measures (Field, 2013; Huck, 2011). Thus, this makes this a great substitution for the independent group’s t-test (Field, 2013). A benefit of choosing the Mann-Whitney U test is that it probably will not produce type II error-false negative (Huck, 2011). The null hypothesis is that the two independent samples come from the same population (Field, 2013; Huck, 2011).

The nonparametric Wilcoxon signed-rank test is best for distributions that are skewed, where variance homogeneity cannot be assumed, and a normal distribution cannot be assumed (Field, 2013; Huck, 2011).  Wilcoxon signed test can help compare two related/correlated samples from the same population (Huck, 2011). Each pair of data is chosen randomly and independently and not repeating between the pairs (Huck, 2011).  This is a great substitution for the dependent t-tests (Field, 2013; Huck, 2011).  The null hypothesis is that the central tendency is 0 (Huck, 2011).

The nonparametric Kruskal-Wallis H-test can be used to compare two or more independent samples from the same distribution, which is considered to be like a one-way analysis of variance (ANOVA) and focuses on central tendencies (Huck, 2011).  It is usually an extension of the Mann-Whitney U-test (Huck, 2011). The null hypothesis is that the medians in all groups are equal (Huck, 2011).

References

  • Field, A. (2013) Discovering Statistics Using IBM SPSS Statistics (4th ed.). UK: Sage Publications Ltd. VitalBook file.
  • Huck, S. W. (2011) Reading Statistics and Research (6th ed.). Pearson Learning Solutions. VitalBook file.
  • Schumacker, R. E. (2014) Learning statistics using R. California, SAGE Publications, Inc, VitalBook file.