Logistic Regression

[Introduction of Logistic Regression~10 minutes]
[Logistic Regression SPSS~4 minutes]

Purpose

Logistic regression -- method of modeling a binary response variable (0 to 1). For example, we want to investigate how school drop out (0) or stay in school (1) can be predicted by the school intervention.

Logistic regression has many analogies to multiple (ordinary least squared--OLS) regression: logit coefficients correspond to b coefficients in the logistic regression equation, the standardized logit coefficients correspond to beta weights, and a pseudo R2 statistic is available to summarize the strength of the relationship. Unlike OLS regression, however, logistic regression does not assume linearity of relationship between the independent variables and the dependent, does not require normally distributed variables, does not assume homoscedasticity, and in general has less stringent requirements. It does, however, require that observations are independent and that the independent variables be linearly related to the logit of the dependent. The success of the logistic regression can be assessed by looking at the classification table, showing correct and incorrect classifications of the dichotomous, ordinal, or polytomous dependent. Also, goodness-of-fit tests such as model chi-square are available as indicators of model appropriateness as is the Wald statistic to test the significance of individual independent variables.

Logistic (binary) regression is used to fit a model to binary response (Y) data, such as whether a subject dies (event) or lives (non-event). These events are often described as success vs failure. For each possible set of values for the independent (X) variables, there is a probability p that a success occurs. [there are multinomial logistic regression but we will cover those]

[Data for Coronary Heart Disease (CD)]


 

The Math

The logit function is used to transform an "S" shape cure into an approximately straight line and to change the range of the proportion from 0-1 to -∞ to +∞.

http://coedpages.uncc.edu/cpflower/rsch8140/images/logist4.gif

 

http://coedpages.uncc.edu/cpflower/rsch8140/images/logist8.gif

http://coedpages.uncc.edu/cpflower/rsch8140/images/logist5.gif

http://coedpages.uncc.edu/cpflower/rsch8140/images/logist10.gif

OR

ln(OR)

0.1

-2.30259

0.4

-0.91629

0.7

-0.35667

1

0

1.3

0.262364

1.6

0.470004

1.9

0.641854

[Link to Spreadsheet Showing the Calculations]

[Link to Simple Logistic Regression]

logit(p) = a + b1x1 + b2x2 + ... + bixi

where p is the probability of dropout and x1, x2 ... xi are the explanatory variables.

Advantage & Limitations Logistic Regression Analysis

Practical Issues

Interpretation of Weights (B)--B increases the log-odds for a one unit increase in X.

You can see there is a little more work involved after you get the equation. The linear regression equation that we created in SPSS creates the logit or log of the odds. That is, the linear regression equation is the nature log of the probability of being in one group divided by the probability of being in the other group.

A procedure called maximum likelihood (ML) estimation is used to estimate the coefficients.

Logistic regression also produces Odds Ratios (OR) associated with each predictor value. The odds of an event is defined as the probability of the outcome event occurring divided by the probability of the event not occurring. The odds ratio for a predictor tells the relative amount by which the odds of the outcome increase (OR greater than 1.0) or decrease (OR less than 1.0) when the value of the predictor value is increased by 1.0 units.


Assessment of the Fit of the Model

After estimating the coefficients, there are several steps involved in assessing the appropriateness, adequacy and usefulness of the model. First, the importance of each of the explanatory variables is assessed by carrying out statistical tests of the significance of the coefficients. The overall goodness of fit of the model is then tested. Additionally, the ability of the model to discriminate between the two groups defined by the response variable is evaluated. Finally, if possible, the model is validated by checking the goodness of fit and discrimination on a different set of data from that which was used to develop the model.

1. Wald χ2 statistics (reliability is questionable)

2. Likelihood Ratio Test

The likelihood ratio test for a particular parameter compares the likelihood of obtaining the data when the parameter is zero (L0) with the likelihood (L1) of obtaining the data evaluated at the MLE of the parameter. The test statistic is calculated as follows:

-2 ln(likelihood ratio) = -2 ln(L0/L1) = -2 (lnL0 - lnL1)

It is compared with a χ2 distribution with 1 degree of freedom.

3. Goodness of fit

An external file that holds a picture, illustration, etc.
Object name is cc3045-i9.gif

4. R2 for logistic regression

5. Discrimination (classification accuracy)


From Huck Text