Multiple Regression Procedure
Podcasts ~ 9 minutes per podcast:
Part 1
Part 2
Part 3
Part 4
Part 5
Part 6
Part 7


The Role of Theory

Causes are connected by effects; but this is because our theories connect them, not because the world is held together by cosmic glue. The world may be glued together by imponderables, but that is irrelevant for understanding causal explanation. The notions behind "the cause x" and "the effect y" are intelligible only against a pattern of theory, namely one which puts guarantees on inferences from x to y. Such guarantees distinguish truly causal sequences from mere coincidence. (Hanson, 1958, p. 64).

"Consistency of the model with the data, however, does not constitute proof of a theory; at best it only lends support to it." p. 580, Pedhazur, E.J. (1982). Multiple Regression in Behavioral Research.

Two Purposes of Multiple Regression

1. Attempt to predict events or behavior for practical decision-making purposes in applied setting.
2. Attempt to understand or explain  the nature of a phenomenon for purposes of testing/developing theory.

(from Mark H. Licht)

Examples   --
1. Colleges and universities would like to predict which students will be success and should be admitted.
2. We try to understand students academic performance based on their experiences and demographic characteristics.
3.
Does exposure to parental interpersonal violence contribute to bullying behavior?

Multiple Regression

Multiple regression procedures are the most popular statistical procedures used in social science research. The difference between the multiple regression procedure and simple regression is that the multiple regression has more than one independent variable. The linear regression equation takes the following form

where n is the number of independent variables. On the right side of the equation, we are creating a linear combination of multiple independent (or predictor) variables. If you think about it, we are creating a super variable.

The goal is to estimate a set of b values (regression coefficients) that bring the Y' values predicted from the equate as close as possible to the Y value obtained--minimize the sums of squared deviations between predicted & obtained Y & maximize the correlation between the predicted & obtained Y.

Recall for simple regression:
Illustration of slope and intercept

Visualize MR (or world peace)

Fundamental Equation for Multiple Regression

Enter the data on page 129 [data]
IVs: MOTIV (professional motivation), QUAL (qualifications for admissions), GRADE (performance in grad courses)
DV: COMPR (comprehensive exam)

COMPR` = A + B1(MOTIV) + B2(QUAL) + B3(GRADE)

How do we estimate weights? Least-squares solution: (Y-Y`)2

File:Linear least squares example2.svg

Tabachnick & Fidell (p 130) do an excellent & simplistic job of demonstrating how the sums of squares (SS) are decomposed into two parts to calculate the R2 --

SSY = SSreg + SSres

R2 = SSreg/SSY

[Link to Spreadsheet]

For this small dataset, here is the multiple regression equation:

COMPR` = -4.72 + .658(MOTIV) + .272(QUAL) + .416(GRADE)

 

Important Terms

On pp 131-134, a simple illustration of how R2  and standardized regression coefficients are calculated.


Determining Importance of IVs (Predictors)

If your Multiple R is statistically significant, then you want to interpret the importance of the individual IVs.

Guidelines (from the work of Mark H. Licht)
1. If measurement scales of the variables are meaningful (dollars, SPA, heart rate, etc.), use the raw regression coefficients.
2. If you want to compare relative contributions of each predictor, use the standardized regression coefficient.

Differences between standardized (a.k.a., standardized regression coefficient) & unstandardized beta (a.k.a., raw regression coefficient)
[Height in inches & feet, and weight]

Let's interpret the following equation (unstandardized):
COMPR` = -4.72 + .658(MOTIV) + .272(QUAL) + .416(GRADE)

Areas Representing Squared Multiple Correlation (R2), Squared Semipartial (aka, Part) Correlation (sri2), & Squared Partial Correlation (pri2)

[Correlation=linear measure; Square the correlation=area measure]

R2--The variance in the DV scores that is predictable from variability in all the IVs (predictor variables) (R2)

R2=(a+b+c) / (a+b+c+d)

Adjusted R2--measures the shrinkage in predictive power

Squared Partial Correlation:
For IV1: pr
12=a / (a+d)
For IV2: pr22=c / (c+d)

Partial correlation, removes from both the given IV and the DV all variance accounted for by the control IVs, then correlates the unique component of the IV with the unique component of the DV. [Play with partial correlation]

Squared Semipartial (Part) Correlation:
For IV1: sr
12=a / (a+b+c+d)
For IV2: sr22=c / (a+b+c+d)

 

Semipartial correlation first removes from that predictor variable (IV) all variance which may be accounted for by other predictor variables (IVs) in a regression model, then correlates the remaining unique component of the IV with the dependent variable (DV).

A squared semipartial correlation represents the proportion of all the variance in Y that is associated with one predictor but not with any of the other predictors.  That is, in terms of the Venn diagram.

Note: Above areas are based on Standard Multiple Regressions. 


 

Types of Multiple Regressions

There are several different kinds of multiple regressions—simultaneous (standard), stepwise (statistical), and hierarchical (sequential) multiple regression.  The different between the methods is how you enter the independent variables into the equation.

1. In simultaneous (aka, standard) multiple regression, all the independent variables are considered at the same time.

2. For stepwise multiple regression, the computer determines the order in which the independent variables become part of the equation. You can think of it as selecting a baseball team. You first select the best player. Your next selection may not be the next best player, but a player that helps round out your team. For example if you select a pitcher first, and for you next selection a pitcher is the next best player, you might select the next best player in a position that you need. The first independent variable entering the regression equation is the one with the largest correlation with the dependent variable. The next independent variable to enter the equation has the next highest shared variance with the dependent variable after taking out the variance for the first independent variable. There is also the forward and backward method.

3. In hierarchical multiple regression, the researcher determines the order that the independent variables are entered in the equation. The order for entering the variables should be based on theory. The focus is on the change in predictability associated with predictor variables entered later in the analysis over and above those entered earlier in the model. The hypothesis are stated differently for hierarchical multiple regression. An example may be, "We predict that participants' skills and knowledge would account for a significant amount of variance in empathy over and above that accounted for by age and race. [Activity]

On page 145, note how the credit for variance changes depending on the method.

Activities

Dataset: [SAT GPA Quality of Letter Data]

1. Regress GPA in college on to SAT.

2. Regress GPA in college on to SAT and GPA in high school

3. Run a standard regression regressing GPA in college on to all three predictors.

4. Run a stepwise regression.

5. Using the data, discuss a plausible theory for predicting GPA in college from the predictor variables & run a hierarchical regression.


Potential Problem & Assumptions

Collinearity

A problem in estimating the b and β weights can arise when the independent variables are highly correlated. Collinearity means that the IVs are totally predicted by the other IVs. There are several diagnostic procedures for examining collinearity. Variance inflation factor (VIF) indicates the linear dependence of one IV on all the other IVs. Big values of VIF (>10) indicate potential problems with collinearity. Tolerance index is equal to 1/VIF; values close to zero could indicate collinearity problems.

Another test for collinearity problems is the Collinearity Diagnostic table produced in SPSS. CP of the IVs are factored resulting in eigenvalues (account for variance of the SSCP matrix. Look for the condition index over 15 on dimensions with large eigenvalues--this indicates an ill fitting matrix.

 If collinearity is a problem, you have several choices:

  1. Proceed with your analysis and caution the reader that the regression coefficients are not well estimated.
  2. Eliminate some of the IVs, especially ones with large VIF, from the analysis.
  3. Run a factor analysis on the IVs find combinations of IVs that could be entered into the model.
  4. Use a ridge regression.

Activity

1. Calculate the correlation coefficients among all the predictor variables.
2. Calculate the VIF for the predictor variables (go under statistics and check collinearity diagnosis).

New Dataset:  [data]
Problem data [data]

Grade = books + atttend + late

Another method of looking at collinearity--The table below is another way of assessing if there is too much multicollinearity in the model. To simplify, crossproducts of the independent variables are factored. High eigenvalues indicate dimensions (factors) which account for a lot of the variance in the crossproduct matrix. Eigenvalues close to 0 indicate dimensions which explain little variance. Multiple eigenvalues close to 0 indicate an ill-conditioned crossproduct matrix, meaning there is a problem with multicollinearity. The condition index summarizes the findings, and a common rule of thumb is that a condition index over 15 indicates a possible multicollinearity problem and a condition index over 30 suggests a serious multicollinearity problem.

 

 

Outliers

In SPSS, Casewise Diagnostics--reports the cases with >3 SDs from the mean value of the DV.

Normality, Linearity, Homoscedasticity of Residuals (p 126)

Errors are independent and follow a normal distribution--

What are the errors (residual)?

File:Linear least squares example2.svg

Residual Plots-(use the plot option)
Focus on plotting the ZRESID and ZPRED (see p 126)

The Durbin-Watson statistic tests for serial correlation of error terms for adjacent cases. This test is mainly used in relation to time series data, where adjacent cases are sequential years. (Garson)

What about nonlinear relationships?
[Polynomial Function Graphs]


Thanks to Dr. Karl L. Wuensch  for sharing his work.  

Coding Nominal Level Variables

Dummy Variable Coding.  X1 codes whether or not an observation is from Group 1 (0 = no, 1 = yes), X2 whether or not it is from Group 2, and X3 whether or not it is from Group 3.  Only k‑1 (4‑1) dummy variables are needed, since an observation that is not in any of the first k‑1 groups must be in the kth group.  The dummy variable coding matrix is thus:  

Group

X1

X2

X3

1

1

0

0

2

0

1

0

3

0

0

1

4

0

0

0

For each dummy variable the partial coefficients represent a contrast between its group and the reference group (the one coded with all 0’s), that is, X1’s partials code Group 1 vs Group 4, X2 codes Group 2 vs Group 4, and X3 codes Group 3 vs Group 4.  Do compare the X3 partial statistics from this program (t = ‑2.429, p = .0412) with the statistics from the “Contrast ‘3 vs 4’” .

Activity (From the work of Jeremy Miles)

T tests: A very simple dataset designed to show the equivalence of t-tests and regression.  The results of an experiment examining the memory of 20 participants, 10 of whom were told how to use a mnemonic, 10 of whom were not.  Two variables, group and score. [data]

Score = 10.1 + 2.5(group)

One-way ANOVA: This is the same data as except a third experimental group added (aromatherapy).  It shows the equivalence of ANOVA and regression. [data]

Score = 8.9 + 1.2(group 0) + 3.7 (group 1)

(group 2 is reference group)

Two-way ANOVA: [data]


Effects Coding.  The design matrix is exactly like that in dummy variable coding except that the “reference group” is coded with “‑1” on each X.  The design matrix is:

Group

X1

X2

X3

1

1

0

0

2

0

1

0

3

0

0

1

4

-1

-1

-1

The result of this coding scheme is that each X’s partial coefficients now represents one group versus the grand mean, that is, X1 represents Group 1 versus the grand mean, X2 represents Group 2 versus the grand mean, etc.  The intercept is now equal to the grand mean.

Contrast Coding.   The design matrix here codes a complete orthogonal set of comparisons:

Group

X1

X2

X3

1

1

1

0

2

1

-1

0

3

-1

0

1

4

-1

0

-1

 X1 contrasts Groups 1 & 2 with Groups 3 & 4, X2 contrasts Group 1 with Group 2, and X3 contrasts Group 3 with Group 4. 

 


Suppressor Variables

(Taken from the work of Kristin Woolley)
"
Most researchers determine the worth of a predictor variable by its correlation with the dependent variable. However, sometimes a variable can raise the total R2 even though it has a negligible correlation with the dependent variable and a strong correlation with the other predictor variables (Hinkle, Wiersma & Jurs, 1994; Pedhazur, 1982). A variable that when added as another predictor increases the total R2 is called a suppressor variable. Horst (1966, p. 363) explained:

A suppressor variable may be defined as those predictor variables which do not measure variance in the criterion measures, but which do measure some of the variance in the predictor measures which is not found in the criterion measure. They measure invalid variance in the predictor measures and serve to suppress this invalid variance. "

There may be cases when you select a predictor that does not have a relationship with the outcome variable (DV) but increases the multiple R.
Suppose that both IV1 and IV2 are positively correlated with DV. That means that if either of those variables increases, we expect to see Y increase. But suppose that the regression equation comes out as

Y = 12.78 + 1.3X1 - 2.4X2 

[Taken from the work of David Howell]

"Cohen's classic example (Maybe it was Darlington), is of a speeded test of history. We want to predict knowledge of historical facts. We give a test which supposedly tests that. But some people will do badly just because they read very slowly, and don't get through the exam. Others read very quickly, and do all of the questions. We don't think that reading speed has anything to do with how much history you know, but it does affect your score. We want to "adjust" scores for reading speed, which is like saying "The correlation between true historical knowledge and test score, controlling for reading speed."

Suppression Examples

Practice
X1=Amount of psychotherapy
X2=Degree of depression
Y=Number of prior suicide attempts

[Dataset]

Classical Suppressor Example [Dataset]

Two Practical Examples of Suppressor Variables can be found here.


Moderator and Mediator Variables

Classic Article:

Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182.

http://www.public.asu.edu/~davidpm/classes/psy536/Baron.pdf

“…. a moderator variable is one that influences the strength of a relationship between two other variables, and a mediator variable is one that explains the relationship between the two other variables. As an example, let's consider the relation between social class (SES) and frequency of breast self-exams (BSE). Age might be a moderator variable, in that the relation between SES and BSE could be stronger for older women and less strong or nonexistent for younger women. Education might be a mediator variable in that it explains why there is a relation between SES and BSE. When you remove the effect of education, the relation between SES and BSE disappears.”

In general, a given variable may be said to function as a mediator to the extend that it accounts for the relation between the predictor and the criterion. Mediators explain how external physical events take on internal psychological significance. Whereas moderator variables specify when certain effects will hold, mediators speak to how or why such effects occur.

 

Figure 1: Age is a moderator

Figure 2: Education is a mediator

 


 

 Moderator
(Based on the work of David A. Kenny)
http://davidakenny.net/cm/moderation.htm

Moderation is usually captured by an interaction between the initial variable and the covariate.

A key part of moderation is the measurement of X to Y causal relationship for different values of M.  We refer to the effect of X on Y for a given value of M as the simple effect X on Y.

Y = i + aX + bM + cXM + E

Moderator variables - "In general terms, a moderator is a qualitative (e.g., sex, race, class) or quantitative (e.g., level of reward) variable that affects the direction and/or strength of the relation between an independent or predictor variable and a dependent or criterion variable. Specifically within a correlational analysis framework, a moderator is a third variable that affects the zero-order correlation between two other variables. ... In the more familiar analysis of variance (ANOVA) terms, a basic moderator effect can be represented as an interaction between a focal independent variable and a factor that specifies the appropriate conditions for its operation." p. 1174

Moderator variables are important, because specific factors (e.g. context information) are often assumed to reduce or enhance the influence that specific independent variables have on specific responses in question (dependent variable).

In analysis of variance (ANOVA) terms, a moderator effect can be represented as an interaction between a major independent variable and a factor that specifies the appropriate conditions for its operation, that is, the effect of the major independent variable depends upon the value of the moderator variable. Consider, for example, a research study looking at two different methods of teach mathematics. If students with strong reading skills do better with one method and those with low reading skills do better with the other than reading is functioning as a moderator variable.

Moderators can be testing by adding an interaction term to our model (crossproduct [multiple] the two IVs together and enter it as a variable in the model), but before doing so you shall center all of the variables -- that is, subtract from each score on each variable the mean of all scores on that variable.  This is necessary to reduce multicollinearity and other problems.  We could just standardize our variables to z scores, which might be preferable when dealing with variables for which the unit of measurement is not intrinsically meaningful.

 

Case 1 (simplest): 2 X 2 ANOVA, look for the interaction

IV = Diet Program (Jenny Craig v. Atkins)
DV = BMI
Mediator = gender

Women reduce BMI at a higher level on Jenny Craig and Men on Atkins

Case 2: Moderator is dichotomous and IV is continuous variable

IV = Measure of BMI
DV=Self Esteem
Moderator = gender

For females there is a strong negative correlation but for males there is a very small negative correlation

Case 3: Moderator is continuous and IV is dichotomous

IV = two diet programs
Moderator = pounds overweight
DV=amount of weight loss

Diet 1 is very successful for individuals w/ greater pounds overweight but Diet 2 is more successful w/ individual with fewer pounds overweight

*you can make the moderator a dichotomous variable

Case 4: Moderator and IV are continuous

create an interaction term

Mediator Variable

 

Researchers clarify the meaning of mediation, by introducing path diagrams as a models for depicting a causal chain. The basic causal chain involved in mediation is diagramed in the Figures 1 & 2 above. These models assumes a three-variable system such that there are two paths feeding into the outcome variable: the direct impact of the IV on the DV and the indirect path from the IV to the DV via the MV. There is also a relation between the MV and the DV.

A variable functions as a mediator when it meets the following conditions:
(a) variations in levels of the IV significantly account for the variations in the presumed mediator (Figure 1),
(b) variations in the MV significantly account for variations in the DV, and
(c) when both IV and MV appear in the model, a previously significant relation between the IV and DV is no longer significant, with the strongest demonstration of mediation occurring when the direct IV to DV path is zero .

[More on Mediators--David Kenny]

[SPSS Macros and Code for SPSS]


Run complete example on p 161--[data] (I transformed some variables for you).

Variables (described on p 161)
timedrs (DV)
phyheal
menheal
stress


Multivariate MR-- you have two or more variables that are to be predicted from two or more predictor variables. 
Use GLM option in SPSS