**Introduction**

The aim of this analysis is to look at the relationship between a father’s education level (dependent variable) when you know the mother’s education level (independent variable). The variable names are “*paeduc*” and “*maeduc*.” Thus, the hope is to determine the linear regression equation for predicting the father’s education level from the mother’s education.

From the SPSS outputs the following questions will be addressed:

- How much of the total variance have you accounted for with the equation?

- Based upon your equation, what level of education would you predict for the father when the mother has 16 years of education?

**Methodology**

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.). The goal is to look at the relationships between the following variables: *paeduc* (HIGHEST YEAR SCHOOL COMPLETED, FATHER) and *maeduc* (HIGHEST YEAR SCHOOL COMPLETED, MOTHER). To conduct a linear regression analysis navigate through Analyze > Regression > Linear Regression. The variable *paeduc* was placed in the “Dependent List” box, and *maeduc* was placed under “Independent(s)” box. The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next four tables.

The relationship between *paeduc* and *maeduc* are plotted in a scatterplot by using the chart builder. Code to run the chart builder code is shown in the code section, and the resulting image is shown in the results section.

**Results**

Table 1: Variables Entered/Removed

Model | Variables Entered | Variables Removed | Method |

1 | HIGHEST YEAR SCHOOL COMPLETED, MOTHER^{b} |
. | Enter |

a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER | |||

b. All requested variables entered. |

Table 1, reports that for the linear regression analysis the dependent variable is the highest years of school completed for the father and the independent variable is the highest year of school completed by the mother. No variables were removed.

Table 2: Model Summary

Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |

1 | .639^{a} |
.408 | .407 | 3.162 |

a. Predictors: (Constant), HIGHEST YEAR SCHOOL COMPLETED, MOTHER | ||||

b. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER |

For a linear regression trying to predict the father’s highest year of school completed based on his wife’s highest year of school completed, the correlation is positive with a value of 0.639, which can only 0.408 of the variance explained (Table 2) and 0.582 of the variance is unexplained. The linear regression formula or line of best fit (Table 4) is: y = 0.76 x + (2.572 years) + e. The line of best fit essentially explains in equation form the mathematical relationship between two variables and in this case the father’s and mother’s highest education level. Thus, if the mother has completed her bachelors’ degree (16th year), then this equation would yield (y = 2.572 years + 0.76 (16 years) + e = 14.732 years + e). The e is the error in this prediction formula, and it exists because of the r2 value is not exactly -1.0 or +1.0. The ANOVA table (Table 3) describes that this relationship between these two variables is statistically significant at the 0.05 level.

Table 3: ANOVA Table

Model | Sum of Squares | df | Mean Square | F | Sig. | |

1 | Regression | 6231.521 | 1 | 6231.521 | 623.457 | .000^{b} |

Residual | 9045.579 | 905 | 9.995 | |||

Total | 15277.100 | 906 | ||||

a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER | ||||||

b. Predictors: (Constant), HIGHEST YEAR SCHOOL COMPLETED, MOTHER |

Table 4: Coefficients

Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||

B | Std. Error | Beta | ||||

1 | (Constant) | 2.572 | .367 | 7.009 | .000 | |

HIGHEST YEAR SCHOOL COMPLETED, MOTHER | .760 | .030 | .639 | 24.969 | .000 | |

a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER |

The image below (Figure 1), is a scatter plot, which is plotting the highest year of school completed by the mother vs. the father along with the linear regression line (Table 4) and box plot images of each respective distribution. There are more outliers in the husband’s education level compared to those of the wife’s education level, and the spread of the education level is more concentrated about the median for the husband’s education level.

Figure 1: Highest year of school completed by the mother vs the father scatter plot with regression line and box plot images of each respective distribution.

**Conclusion**

There is a statistically significant relation between the husband’s and wife’s highest year of education completed. The line of best-fit formula shows a moderately positive correlation and is defined as y = 0.76 x + (2.572 years) + e; which can only explain 40.8% of the variance, while 58.2% of the variance is unexplained.

**SPSS Code**

DATASET NAME DataSet1 WINDOW=FRONT.

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT paeduc

/METHOD=ENTER maeduc

/CASEWISE PLOT(ZRESID) OUTLIERS(3).

STATS REGRESS PLOT YVARS=paeduc XVARS=maeduc

/OPTIONS CATEGORICAL=BARS GROUP=1 BOXPLOTS INDENT=15 YSCALE=75

/FITLINES LINEAR APPLYTO=TOTAL.

**References:**

- GSS (n.d.) SPSS data file [DataSet]. Retrieved from https://classroom.coloradotech.edu/app/classResourceRedirect.html?id=2931693&url=/lms/class/95707/document/2931693/open
- Miller, R. (n.d.). Week 7: Regression. [Video file]. Retrieved from http://breeze.careeredonline.com/p6ioo7i8x1k/?launcher=false&fcsContent=true&pbMode=normal