[Previous] [Next] [Up] [Top] Categorical Data Analysis with Graphics
Michael Friendly

Logistic Regression

Logistic Regression Model

Logistic regression describes the relationship between a dichotomous response variable and a set of explanatory variables. The explanatory variables may be continuous or (with dummy variables) discrete.
------------------------
(2) Some material in this section borrows from Koch & Stokes (1991).
------------------------

Consider the data below on the effects of a particular treatment for arthritis. The response is improvement; sex and treatment are the explanatory variables.

                        Improvement
  Sex   Treatment    None    Some/Marked     Total

   F     Active        6         21            27
   F     Placebo      19         13            32

   M     Active        7          7            14
   M     Placebo      10          1            11

Let pi sub hi be the probability that a patient of sex h who receives treatment i will show some or marked improvement. It is convenient to model this probability in terms of the log odds of improvement, called the logit ,

(11)

                  Number        Observed     Odds   Observed
  SEX    Treatmt  Better Total  P{better}   Better   Logit

 Female  Active     21     27    0.7778      3.50    1.2529
 Female  Placebo    13     32    0.4062      0.68   -0.3797
 Male    Active      7     14    0.5000      1.00    0.0000
 Male    Placebo     1     11    0.0909      0.10   -2.3026

The logistic regression model fits the log odds by a linear function of the explanatory variables (as is multiple regression).

(12)

For example, a simple model might assume additive ("main") effects for sex and treatment on the log odds of improvement.

(13)

where

x sub 1 and x sub 2 are dummy variables representing sex and treatment, respectively:
x sub 1 = left "{" lpile { { 0 ' ' , ' ' roman ' if male' } above { 1 ' ' , ' ' roman ' if female' } } right ""

x sub 2 = left "{" lpile { { 0 ' ' , ' ' roman ' if placebo' } above { 1 ' ' , ' ' roman ' if active' } } right ""
alpha is the log odds of improvement for males receiving placebo.
beta sub 1 is the increment in log odds for being female. Therefore, e sup < beta sub 1 > gives the odds of improvement for females relative to males.
beta sub 2 is the increment in log odds for being in the active treatment group. e sup < beta sub 2 > gives the odds of improvement for the active treatment group relative to placebo.

Thus, the parameters defined here are incremental effects. The intercept corresponds to a baseline group (males given the placebo); the other parameters are incremental effects for the other groups compared to the baseline group.

Predicted probabilities

For plotting and interpreting results from logistic regression, it is usually more convenient to express fitted values on the scale of probabilities. The inverse transformation of (11) and (12) is the logistic function,

(14)

For the example, when alpha , beta sub 1 , and beta sub 2 have been estimated, the predicted odds and probabilities are:

Sex     Treatment   Odds Improved                 Pr{Improved}

Female  Active     e sup {alpha+beta_1+beta_2}      e sup {alpha+beta_1+beta_2 }/(1+e sup {alpha+beta_1+beta_2})

Female  Placebo    e sup {alpha + beta_1}        e sup {alpha + beta_1}/(1 + e sup {alpha + beta_1})

Male    Active     e sup {alpha + beta_2}        e sup {alpha + beta_2}/(1 + e sup {alpha + beta_2})

Male    Placebo    e sup {alpha}                 e sup {alpha}/(1 + e sup {alpha})

The formulas for predicted probabilities may look formidable, but the numerical results are easy to interpret.

Fitting Logistic Regression Models

Logistic regression models can be fit using PROC LOGISTIC, PROC CATMOD, PROC GENMOD and SAS/INSIGHT. The examples below illustrate the use of PROC LOGISTIC. The input data set for PROC LOGISTIC can be in one of two forms: frequency form -- one observation per group, with a variable containing the frequency for that group. A FREQ statement is used to provide the frequency variable. The other form is case form -- one observation per case.

The following DATA step creates a data set in frequency form named arthrit. The dummy variables _SEX_ and _TREAT_ corresponding to x sub 1 and x sub 2 are created, as is the dichotomous response variable, better.

The first logistic regression model includes effects for sex and treatment, specified by the dummy variables on the MODEL statement. Note that by default, PROC LOGISTIC orders the response values in increasing order, and sets up the model so that it is predicting the probability of the smallest ordered value, Pr{better=0}, which means it would be modelling the probability of No improvement. The DESCENDING option (available with Version 6.08) reverses this order, so that predicted results will be for Pr{better=1).(3)
------------------------
(3) In earlier releases, a format is used to order the values of better so that the first value corresponds to Improved.
------------------------

data arthrits;
   input sex $ trtment $ improve $ count;
   _treat_ = (trtment='Active');
   _sex_   = (sex='F');
   better  = (improve='some');
cards;
F Active  none   6
M Active  none   7
F Active  some  21
M Active  some   7
F Placebo none  19
M Placebo none  10
F Placebo some  13
M Placebo some   1
;
proc logistic data=arthrits descending;
   freq count;
   model better = _sex_ _treat_ / scale=none aggregate;

The options scale=none aggregate provide goodness of fit tests for the model. The goodness of fit tests are based on the difference between the actual model fitted and the saturated model (containing an interaction of sex and treatment in this example), which would fit perfectly. The following results are produced:

+-------------------------------------------------------------------+
|                                                                   |
|             Deviance and Pearson Goodness-of-Fit Statistics       |
|                                                          Pr >     |
|        Criterion        DF       Value    Value/DF    Chi-Square  |
|                                                                   |
|        Deviance          1      0.2776      0.2776        0.5983  |
|        Pearson           1      0.2637      0.2637        0.6076  |
|                                                                   |
|                      Number of unique profiles: 4                 |
|                                                                   |
|                  Testing Global Null Hypothesis: BETA=0           |
|                                                                   |
|                           Intercept                               |
|              Intercept       and                                  |
|  Criterion     Only      Covariates   Chi-Square for Covariates   |
|                                                                   |
|  AIC           118.449      104.222      .                        |
|  SC            118.528      104.460      .                        |
|  -2 LOG L      116.449       98.222    18.227 with 2 DF (p=0.0001)|
|  Score            .            .       16.797 with 2 DF (p=0.0002)|
|                                                                   |
+-------------------------------------------------------------------+

The Chi-square tests for BETA=0 above test the joint effect of sex and treatment. Individual effects in the model are tested by Wald chi² s, the squared ratio of each parameter divided by its standard error. These tests, shown below, indicate that both sex and treatment effects are highly significant.

+--------------------------------------------------------------------------+
|                                                                          |
|                   Analysis of Maximum Likelihood Estimates               |
|                                                                          |
|              Parameter Standard    Wald       Pr >    Standardized    Odds
|  Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate        Ratio
|                                                                          |
|  INTERCPT 1    -1.9037   0.5982    10.1286     0.0015            .    .  |
|  _SEX_    1     1.4687   0.5756     6.5092     0.0107     0.372433   4.343
|  _TREAT_  1     1.7817   0.5188    11.7961     0.0006     0.493956   5.940
|                                                                          |
+--------------------------------------------------------------------------+

Model interpretation

The fitted model,

roman 'logit' ( pi sub hi ) = -1.90 + 1.47_sex_ + 1.78 _treat_ (15)

is most easily interpreted by considering the odds ratios corresponding to the parameters:

1.47 is the increment to log odds of a better outcome for females; the odds ratio e sup 1.47 = 4.34 indicates that females are 4.3 times as likely to achieve a better outcome than males.
1.78 is the increment to log odds for the treatment group; the odds ratio e sup 1.78 = 5.94 indicates that the treated group is nearly 6 times as likely to achieve a better outcome than the placebo group.

Plotting results from PROC LOGISTIC

PROC LOGISTIC also calculates predicted probabilities and logits, and these results may be obtained in an output data set, from which plots can be made.

proc logistic data=arthrit;
   freq count;
   model  better = _sex_  _treat_ ;

   output out=results p=predict l=lower u=upper xbeta=logit;

The OUTPUT statement produces a data set containing estimated logit values for each group, and corresponding predicted probabilities of improvement and confidence limits ( UPPER, LOWER) for these probabilities.

The output data set RESULTS contains the following variables. There are two observations for each group (for none and some improvement). The PREDICT variable gives the predicted probability of an improved outcome according to model (15), using the inverse transformation (14) of logit to probability.

+-------------------------------------------------------------------+
|                                                                   |
|  SEX  TRTMENT  IMPROVE  COUNT  PREDICT    LOWER    UPPER    LOGIT |
|                                                                   |
|   F   Active    none       6     0.794    0.620    0.900    1.347 |
|   M   Active    none       7     0.470    0.257    0.694   -0.122 |
|   F   Active    some      21     0.794    0.620    0.900    1.347 |
|   M   Active    some       7     0.470    0.257    0.694   -0.122 |
|   F   Placebo   none      19     0.393    0.248    0.560   -0.435 |
|   M   Placebo   none      10     0.130    0.044    0.325   -1.904 |
|   F   Placebo   some      13     0.393    0.248    0.560   -0.435 |
|   M   Placebo   some       1     0.130    0.044    0.325   -1.904 |
|                                                                   |
+-------------------------------------------------------------------+

To plot the predicted probabilities of improvement and confidence limits from the RESULTS data set, we select the observations for improve='some'. A plot can be done as a bar chart with PROC GCHART or as a line graph with PROC PLOT. Confidence limits can be added to either with the SAS/GRAPH Annotate facility. The statements below show how a grouped horizontal bar chart is constructed.

data results;
   set results;
   if improve='some';
   label predict='Prob. Improved';
data limits;
   set results;
   xsys='2'; ysys='2';
   midpoint=trtment;
   group=sex;
   x = lower;  function='MOVE   '; output;
   text='|';   function='LABEL  '; output;
   x = upper;  function='DRAW   '; output;
   text='|';   function='LABEL  '; output;
proc gchart data=results;
   hbar trtment / sumvar=predict group=sex gspace=3
                  anno=limits
                  raxis=axis1
                  maxis=axis2
                  gaxis=axis3;
   axis1 order=(0 to 1 by .2)
         label=(h=1.5 f=duplex)
         value=(h=1.3 f=duplex);
   axis2 label=(h=1.4 f=duplex 'Treat')
         value=(h=1.2 f=duplex);
   axis3 label=(h=1.4 f=duplex)
         value=(h=1.2 f=duplex);
   pattern1 v=empty;

Figure 25: Predicted probabilities of improvement

Quantitative predictors

Logistic regression can be generalized to include continuous explanatory variables. The main differences are:

the data are usually entered in case form, with one observation per person.
the statistics for goodness-of-fit are computed differently.
the output data set for plotting now contains one observation per person.

The DATA step below creates a SAS data set named arthrit, in case form. It is equivalent to the earlier example, but contains the age of each person in the sample. Programming statements are used to create dummy variables _sex_ and_treat_ as before. Variables for testing for interactions among sex, treatment and age are also created. A preliminary analysis (described later) is used to test whether any of these variables interact. That test shows that all interactions can safely be ignored in this example. That test is one suitable test for goodness of fit of the main effects model.

data arthrit;
   length treat $7. sex $6. ;
   input id treat $ sex $ age improve @@ ;
   better  = (improve > 0);            /* Dichotomous response    */
   _treat_ = (treat ='Treated') ;      /* Dummy var for treatment */
   _sex_   = (sex = 'Female');         /*           and sex       */
   agesex  = age*_sex_ ;               /* Dummy var for testing   */
   agetrt  = age*_treat_;              /*   interactions          */
   sextrt  = _sex_*_treat_;
   age2    = age*age ;
  cards ;
57 Treated Male   27 1   9 Placebo Male   37 0
46 Treated Male   29 0  14 Placebo Male   44 0
77 Treated Male   30 0  73 Placebo Male   50 0
  ... 
(observations omitted)
56 Treated Female 69 1  42 Placebo Female 66 0
43 Treated Female 70 1  15 Placebo Female 66 1
                        71 Placebo Female 68 1
                         1 Placebo Female 74 2

The next logistic model includes (main) effects for age, sex, and treatment. The lackfit option requests a lack-of-fit test proposed by Hosmer and Lemeshow (1989). This test divides subjects into deciles based on predicted probabilities, then computes a chi² from observed and expected frequencies. In this example, we plot both a confidence interval for Pr{Improved} and roman 'logit' ± roman 's.e.' . To make these intervals roughly comparable, we choose alpha = .33 to give a 67% confidence interval.

title2 h=1.3 f=duplex 'Estimated Effects of Age, Treatment and Sex';
proc logistic data=arthrit;
   format better outcome.;
   model  better = _sex_  _treat_  age / lackfit;
   output out=results p=predict l=lower u=upper
                      xbeta=logit stdxbeta=selogit / alpha=.33;

The results include:

+-------------------------------------------------------------------+
|                                                                   |
|                Testing Global Null Hypothesis: BETA=0             |
|                           Intercept                               |
|              Intercept       and                                  |
|  Criterion     Only      Covariates   Chi-Square for Covariates   |
|                                                                   |
|  AIC           118.449      100.063       .                       |
|  SC            120.880      109.786       .                       |
|  -2 LOG L      116.449       92.063     24.386 with 3 DF (p=0.0001)
|  Score            .            .        22.005 with 3 DF (p=0.0001)
|                                                                   |
+-------------------------------------------------------------------+

+-------------------------------------------------------------------+
|                                                                   |
|                   Analysis of Maximum Likelihood Estimates        |
|              Parameter Standard    Wald       Pr >    Standardized    Odds
|  Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate  |  Ratio
|  INTERCPT 1    -4.5033   1.3074    11.8649     0.0006            .|   .
|  _SEX_    1     1.4878   0.5948     6.2576     0.0124     0.377296|  4.427
|  _TREAT_  1     1.7598   0.5365    10.7596     0.0010     0.487891|  5.811
|  AGE      1     0.0487   0.0207     5.5655     0.0183     0.343176|  1.050
|                                                                   |
+-------------------------------------------------------------------+

+-------------------------------------------------------------------+
|                                                                   |
|              Hosmer and Lemeshow Goodness-of-Fit Test             |
|                                                   BETTER = not    |
|                         BETTER = improved           improved      |
|                       --------------------    --------------------|
|  Group       Total    Observed    Expected    Observed    Expected|
|      1           9           0        1.09           9        7.91|
|      2           8           3        1.48           5        6.52|
|      3           8           2        2.00           6        6.00|
|      4           8           2        3.04           6        4.96|
|      5           8           5        3.64           3        4.36|
|      6           9           5        4.82           4        4.18|
|      7           8           4        4.78           4        3.22|
|      8           8           5        5.77           3        2.23|
|      9           8           7        6.62           1        1.38|
|     10          10           9        8.77           1        1.23|
|      Goodness-of-fit Statistic = 5.5549 with 8 DF (p=0.6970)      |
|                                                                   |
+-------------------------------------------------------------------+

Plots are constructed from the data set RESULTS. The first few observations are shown below.

+---------------------------------------------------------------------------+
|                                                                           |
|  ID   TREAT   AGE   SEX  IMPROVE  PREDICT   LOWER   UPPER   LOGIT SELOGIT |
|                                                                           |
|  57  Treated   27  Male     1       0.194   0.103   0.334  -1.427   0.758 |
|   9  Placebo   37  Male     0       0.063   0.032   0.120  -2.700   0.725 |
|  46  Treated   29  Male     0       0.209   0.115   0.350  -1.330   0.728 |
|  14  Placebo   44  Male     0       0.086   0.047   0.152  -2.358   0.658 |
|    ...                                                                    |
|                                                                           |
+---------------------------------------------------------------------------+

Predicted probabilities and confidence limits are contained in the variables PREDICT, UPPER, and LOWER. To show the effects of sex, treatment, and age on Pr{Improvement}, separate plots are drawn for each sex, using the statement BY SEX; in the PROC GPLOT step. Most of the work consists of drawing the confidence limits with the Annotate facility. Plots of the predicted logits can be made in a similar way using the variables LOGIT and SELOGIT.

proc sort data=results;
   by sex treat age;

data bars;
   set results;
   by sex treat;
   length text $8;
   xsys='2'; ysys='2';
   if treat='Placebo' then color='BLACK';
                      else color='RED';
   x  = age; line=33;
   y  = upper;  function='MOVE   '; output;
   text='-';    function='LABEL  '; output;
   y  = lower;  function='DRAW   '; output;
   text='-';    function='LABEL  '; output;
   if last.treat then do;
      y = predict;
      x = age+1; position='6'; size=1.4;
      text = treat; function='LABEL'; output;
      end;
   if first.sex then do;
      ysys ='1'; y=90;
      xsys ='1'; x=10; size=1.4;
      text = sex; function='LABEL'; output;
      end;

goptions hby=0;
proc gplot;
   plot predict * age = treat / vaxis=axis1 haxis=axis2
                                nolegend anno=bars;
   by sex;
   axis1 label=(h=1.3 a=90 f=duplex 'Prob. Improvement (67% CI)')
         value=(h=1.2) order=(0 to 1 by .2);
   axis2 label=(h=1.3 f=duplex)
         value=(h=1.2) order=(20 to 80 by 10)
         offset=(2,5);
   symbol1 v=+ h=1.4 i=join l=3 c=black;
   symbol2 v=$ h=1.4 i=join l=1 c=red;

Scales

Note that the response variable is easy to understand on the probability scale, but the relationship of Pr{Improve} to age, sex, and treatment is complex. On the other hand, the response is somewhat more difficult to understand on the logit scale (log odds), but the relationship of the logit to age, sex, and treatment is simple.

One compromise is to plot the results on the logit scale, and add a second scale showing corresponding probability values. In SAS, this can be done with an Annotate data set. The example below is designed for a plot with logit values ranging from -5 to +2. For each of a set of simple probability values ( prob), the probability is labeled on an axis at the right of the figure corresponding to the logit value at the left.

data pscale;
   xsys = '1';                              * percent values for x;
   ysys = '2';                              * data values for y;
   do prob = .01,.05,.1,.25,.5,.75,.9;
      logit = log( prob / (1-prob) );       * convert to logit;
      y = logit;
      x = 99 ;  function='MOVE   '; output;      * tick marks;
      x = 100;  function='DRAW   '; output;
      text = put(prob,4.3); position='6';        * values;
      style='DUPLEX';
      function='LABEL  '; output;
      end;
   ysys='1'; x=100.1; y=99.5;                    * axis label;
   text='Prob';           output;

In a PROC GPLOT step, the Annotate data set pscale is used as shown below:

proc gplot data=results;
   plot logit * x / annotate=pscale ... ;

Figure 26: Quantitative predictor: Linear and Logit regression on age. The curves show predicted probabilities of improvement and 95% confidence bands (ignoring sex and treatment). The points show the observations. Except in the extremes, the linear and logistic models give similar predicted values.

Fig
Figure 27: Logit regression on age and treatment

Fig Fig
Figure 28: Estimated effects of age, treatment and sex. The effects are simpler (linear and additive) on the logit scale, but more easily interpreted in terms of probabilities. One solution is to plot on the logit scale, and provide a separate (nonlinear) scale of probabilities.

Models with interaction

The examples for the arthritis data have involved only main effects of sex, age, and treatment. (Tests for interactions among these factors show them all to be insignificant.) However, the plotting of estimated logits or predicted probabilities from PROC LOGISTIC is no more complicated in models with interactions.

In fact, since the predicted probabilities and logits are calculated by the procedure and output to the data setRESULTS, the results plotted depend purely on the MODEL statement. The plotting steps remain the same, assuming you want to make separate plots for males and females of the age by treatment effects.

Coefficient method

A different way to construct these same plots is based on calculating the fitted logit values from the coefficients for the variables in the model. For the arthritis main effect model, the fitted relationship is

roman < logit ( p ) = - 4.5033 + 1.4878 % font mono _sex_ + 1.7598 % font mono _treat_ + 0.0487 % font mono age >

With this method, each independent variable is manipulated over its range. The response can be graphed on the probability scale by transforming the logit: p = e sup < roman logit > / ( 1 + e sup < roman logit > ) . For example, the fitted logits and corresponding probabilities can be calculated in this data step:

data fitted;
  do _sex_ = 0 to 1;
     do _treat_ = 0 to 1;
        do age = 25 to 75 by 5;
           logit= -4.5033 + 1.4878*_sex_ + 1.7598*_treat_
                          + 0.0487*age;
           prob = exp(logit) / (1 + exp(logit));
           output;
           end;
        end;
     end;

Fox (1987) explains how this method may be used to construct adjusted effects plots for particular interactions, adjusting for other variables not represented in the plot.

Testing for interactions

One way to test for interactions with PROC LOGISTIC is to start with a main-effects model, and use a forward-selection method to find interaction terms which significantly reduce the residual chi² .

The interaction effects were defined in the data steparthrit as the dummy variables, agesex,agetrt, and sextrt. The variable age2 = age**2 can be used to test whether the relationship between age and logit(better) is quadratic rather than linear.

The PROC LOGISTIC step below requests a forward-selection procedure. Setting START=3 requests that the model building begin with the first three variables (the main effects) listed in the model statement. SLENTRY=1 (significance level to enter) forces all variables to enter the model eventually.

proc logistic data=arthrit;
   format better outcome.;
   model  better = _sex_ _treat_ age        /* main effects  */
                   agesex agetrt sextrt     /* interactions  */
                   age2                     /* quadratic age */
          / selection=forward
            start=3
            slentry=1;

The variables included in each model for the selection procedure are listed in a note at the beginning of each set of results:

+-------------------------------------------------------------------+
|                                                                   |
|  Step  0. The following variables were entered:                   |
|           INTERCPT  _SEX_     _TREAT_   AGE                       |
|                                                                   |
+-------------------------------------------------------------------+

Results for this step are identical to those of the main effects model given earlier. Near the end of this step, the residual chi² is printed, which corresponds to a joint test for the other four variables. This test is an appropriate test of goodness of fit of the main effects model.

+-------------------------------------------------------------------+
|                                                                   |
|  Residual Chi-Square = 4.0268 with 4 DF (p=0.4024)                |
|                                                                   |
+-------------------------------------------------------------------+

Other tests printed show none of the interaction terms is significant individually.

Ordinal Response: Proportional Odds Model

The proportional odds model extends logistic regression to handle an ordinal response variable. The response variable improve in the arthritis data actually has 3 levels, corresponding to None, Some, or Marked improvement.

                          Improvement
  Sex   Treatment    None    Some   Marked    Total

   F     Active        6       5      16        27
   F     Placebo      19       7       6        32

   M     Active        7       2       5        14
   M     Placebo      10       0       1        11

One way to model these data is to consider two logits for the dichotomies between adjacent categories:

roman 'logit' % ( theta sub hi1 ) = log < pi sub hi1 > over < pi sub hi2 + pi sub hi3 > = roman < 'logit ( None vs. [Some or Marked] )' >

roman 'logit' % ( theta sub hi2 ) = log < pi sub hi1 + pi sub hi2 > over < pi sub hi3 > = roman < 'logit ( [None or Some] vs. Marked )' >

Consider a logistic regression model for each logit:

roman logit ( pi sub hi1 ) = alpha sub 1 + x prime sub hi % beta sub 1 (16)

roman logit ( pi sub hi2 ) = alpha sub 2 + x prime sub hi % beta sub 2 (17)

The proportional odds assumption is that the regression functions are parallel on the logit scale, i.e., that beta sub 1 = beta sub 2 .

For the arthritis example, with additive effects for sex and treatment on the both log odds,

roman logit % ( pi sub hi1 ) = alpha sub 1 + beta sub 1 % x sub 1 + beta sub 2 % x sub 2 (18)

roman logit % ( pi sub hi2 ) = alpha sub 2 + beta sub 1 % x sub 1 + beta sub 2 % x sub 2 (19)

where:

x sub 1 and x sub 2 are dummy variables representing sex and treatment.
alpha sub 1 is the log odds of no improvement (vs. some or marked) for males receiving the placebo.
alpha sub 2 is the log odds of no improvement or some improvement (vs. marked) for males receiving the placebo.
beta sub 1 is the increment to both log odds for being female. Therefore, e sup < beta sub 1 > gives the odds of improvement for females relative to males.
beta sub 2 is the increment to both log odds for being in the active treatment group. e sup < beta sub 2 > gives the odds of improvement for the active treatment group relative to placebo.

The corresponding models including effects of age, as well as treatment and sex are similar to (18) and (19), with the addition of a term beta sub 3 % age . Fig

Figure 29: Proportional odds model. The model assumes that the regression functions for different response categories are parallel on the logit scale.

Plotting results from PROC LOGISTIC

Plotting results for the proportional odds model is similar to the earlier examples for a binary response. The main differences are:

The validity of the analysis depends on the correctness of the proportional odds assumption. A test of this assumption appears in the output from PROC LOGISTIC.
The results from PROC LOGISTIC are cast in terms of predicted probabilities and fitted logits for response less than each of the cutpoints. To plot Pr{Improve}, we must reverse the sense of the probabilities and logits.

Example

This example fits the effects of treatment, sex and age for the proportional odds model with the arthritis data. Note that the dependent variable is improve, with values 0, 1, and 2.

   *-- Proportional Odds Model: Effects of treat, sex and age;
proc logistic data=arthrit nosimple;
   model  improve = _sex_  _treat_ age ;
   output out=results p=predict l=lower u=upper
                      xbeta=logit stdxbeta=selogit / alpha=.33;

The response profile displays the ordering of the outcome variable. Note that logits are formed from top to bottom, i.e., none vs. some or marked, none or some vs. marked. The output also shows the proportional odds assumption is reasonable.

+-------------------------------------------------------------------+
|                         Response Profile                          |
|                    Ordered                                        |
|                      Value  IMPROVE     Count                     |
|                                                                   |
|                          1        0        42                     |
|                          2        1        14                     |
|                          3        2        28                     |
+-------------------------------------------------------------------+

+-------------------------------------------------------------------+
|         Score Test for the Proportional Odds Assumption           |
|             Chi-Square = 2.4917 with 3 DF (p=0.4768)              |
+-------------------------------------------------------------------+

The parameter estimates relate to the odds of a poorer response (they are all negative):

+---------------------------------------------------------------------+
|                                                                     |
|             Analysis of Maximum Likelihood Estimates                |
|                                                                     |
|            Parameter  Standard     Wald        Pr >    Standardized
|  Variable   Estimate    Error   Chi-Square  Chi-Square   Estimate   |
|                                                                     |
|  INTERCP1     3.7837    1.1530     10.7683      0.0010            . |
|  INTERCP2     4.6827    1.1949     15.3569      0.0001            . |
|  _SEX_       -1.2517    0.5321      5.5343      0.0186    -0.317412 |
|  _TREAT_     -1.7453    0.4772     13.3770      0.0003    -0.483871 |
|  AGE         -0.0382    0.0185      4.2358      0.0396    -0.268666 |
|                                                                     |
+---------------------------------------------------------------------+

The output data set RESULTS contains, for each observation, the predicted probability, Pr{Not Improved} and estimated logit for both types of odds. These are distinguished by the variable _LEVEL_. To plot probabilities for both types of improvement in a single graph, the values of TREAT and _LEVEL_ are combined in a single variable. To plot Pr{Improve}, we must reverse the direction of the variables in a data step:

data results;
   set results;
   treatl = treat||put(_level_,1.0);
   if _level_=0 then better = (improve > 0);
                else better = (improve > 1);
   *-- Change direction of probabilities & logit;
   predict = 1 - predict;
   lower = 1 - lower;
   upper = 1 - upper;
   logit = -logit;

The first few observations in the data set RESULTS after these changes are shown below:

+-------------------------------------------------------------------+
|                                                                   |
|  ID  TREAT   SEX IMPROVE _LEVEL_ PREDICT   LOWER   UPPER   LOGIT  |
|                                                                   |
|  57 Treated Male    1       0      0.267   0.417   0.157  -1.008  |
|  57 Treated Male    1       1      0.129   0.229   0.069  -1.907  |
|   9 Placebo Male    0       0      0.085   0.149   0.048  -2.372  |
|   9 Placebo Male    0       1      0.037   0.069   0.019  -3.271  |
|  46 Treated Male    0       0      0.283   0.429   0.171  -0.932  |
|  46 Treated Male    0       1      0.138   0.238   0.076  -1.831  |
|    ...                                                            |
|                                                                   |
+-------------------------------------------------------------------+

As in the earlier example, an Annotate data set is used to add more descriptive labels and confidence intervals to the plots. (This adds somewhat more work, but I prefer the plots labeled this way, rather than with legends at the bottom.

proc sort data=results;
   by sex treatl age;
data bars;
   set results;
   by sex treatl;
   length text $8;
   xsys='2'; ysys='2';
   if treat='Placebo' then color='BLACK';
                      else color='RED';
   x  = age; line=33;
   *-- plot confidence limits  ;
   y  = upper;  function='MOVE   '; output;
   text='-';    function='LABEL  '; output;
   y  = lower;  function='DRAW   '; output;
   text='-';    function='LABEL  '; output;
   if last.treatl then do;
      y = predict;
      x = age+1; position='C'; size=1.4;
      text = treat; function='LABEL'; output;
      position='F';
      if _level_ = 0
         then text='> None';
         else text='> Some';;
      output;
      end;
   if first.sex then do;
      ysys ='1'; y=90;
      xsys ='1'; x=10; size=1.5;
      text = sex; function='LABEL'; output;
      end;

goptions hby=0;
proc gplot;
   plot predict * age = treatl / vaxis=axis1 haxis=axis2
                                nolegend anno=bars    ;
   by sex;
   axis1 label=(h=1.4 a=90 f=duplex 'Prob. Improvement (67% CI)')
         value=(h=1.2) order=(0 to 1 by .2);
   axis2 label=(h=1.4 f=duplex)
         value=(h=1.2) order=(20 to 80 by 10)
         offset=(2,5);
   symbol1 v=+ h=1.4 i=join l=3 c=black;
   symbol2 v=+ h=1.4 i=join l=3 c=black;
   symbol3 v=$ h=1.4 i=join l=1 c=red;
   symbol4 v=$ h=1.4 i=join l=1 c=red;

Figure 30: Predicted probabilities for the proportional odds model

Polytomous Response: Nested Dichotomies

When the response, y, takes on m > 2 discrete values, there are several ways to model the response probabilities. Let pi sub ij identical pi sub j % ( x sub i ) be the probability of response j for case or group i. Because Sigma sub j % pi sub ij = 1 , only m - 1 of these probabilities are independent.

The proportional odds model is attractive if the response is ordinal, and the proportional odds assumption holds. However, if the response is purely nominal (e.g., vote Tory, Liberal, Reform, NDP), or if the proportional odds assumption is untenable, one particularly simple strategy is to fit separate models to a set of m - 1 dichotomies derived from the polytomous response.(4)

Each dichotomy can be fit using the familiar binary-response logistic model, and,
the m - 1 models will be statistically independent (so that likelihood-ratio G sup 2 statistics will be additive) if the dichotomies are chosen as nested dichotomies.

Nested dichotomies are successive binary partitions of the response categories into nested sets. For example, the response categories {1,2,3,4} could be divided first as {1,2} vs. {3,4}. Then these two dichotomies could be divided as {1} vs. {2}, and {3} vs. {4}. Alternatively, these response categories could be divided as {1} vs. {2,3,4}, then {2} vs {3,4}, and finally {3} vs. {4}. (For levels of a factor in an ANOVA design, nested dichotomies correspond to orthogonal contrasts.)

          +--------++--------+          +---++-------------+
          |  1  2  ||  3  4  |          | 1 ||  2   3   4  |
          +--------++--------+          +---++-------------+
             |             |                       |
             v             v                       v
       +----++----+  +----++----+            +----++-------+
       | 1  || 2  |  | 3  || 4  |            | 2  || 3   4 |
       +----++----+  +----++----+            +----++-------+
                                                       |
                                                       v
                                                  +---++---+
                                                  | 3 || 4 |
                                                  +---++---+

Example: Women's Labour-Force Participation

To illustrate, Fox (1984; 1995) presented data on women's labour-force participation in Canada, where women were classified as not working outside the home (n=155), working part-time (n=42) or working full-time (n=66). Predictor variables were presence/absence of children, and husband's income; a third variable, region of Canada, is not considered here. For these data, it makes sense to model the log odds for two nested dichotomies:

Working vs. NotWorking
Working fulltime vs working parttime.

------------------------
(4) An alternative strategy is to choose one response category (the last, say) as the "base category", and model the generalized logits for each of categories j = 1, 2, ... , (m - 1) compared to category m. For a 3-category response, e.g., there are 2 generalized logits, roman 'logit' sub i1 = log ( pi sub i1 / pi sub i3 ) and roman 'logit' sub i1 = log ( pi sub i2 / pi sub i3 ) . These models can be fit using PROC CATMOD.
------------------------
The data are read in as follows shown below. The 3-level variable labour is used to define two dichotomous variables, working, and fulltime. Note that fulltime is defined (has non-missing values) only for working women.

proc format;
   value labour    /* labour-force participation */
      1 ='working full-time'  2 ='working part-time'
      3 ='not working';
   value kids     /* presence of children in the household */
      0 ='Children absent'  1 ='Children present';
data wlfpart;
   input case labour husinc children region;
   working = labour < 3;
   if working then
      fulltime = (labour = 1);
cards;
  1  3  15  1  3
  2  3  13  1  3
  3  3  45  1  3
  4  3  23  1  3
  5  3  19  1  3
  6  3   7  1  3
  7  3  15  1  3
  8  1   7  1  3
  9  3  15  1  3
  ... more data lines ...

An initial analysis attempts to fit the proportional odds model, with the 3-level labour variable as the response:

proc logistic data=wlfpart nosimple;

   model labour  = husinc children ;

   title2 'Proportional Odds Model for Fulltime/Parttime/NotWorking';

However, the proportional odds assumption is rejected by the score test:

+-------------------------------------------------------------------+
|                                                                   |
|         Score Test for the Proportional Odds Assumption           |
|                                                                   |
|            Chi-Square = 18.5641 with 2 DF (p=0.0001)              |
|                                                                   |
+-------------------------------------------------------------------+

Hence, we fit models for each of the working andfulltime dichotomies. The descending option is used so that in each case the probability of a 1 response (working, or fulltime) will be the event modelled.

proc logistic data=wlfpart nosimple descending;

   model working = husinc children ;

   output out=resultw p=predict xbeta=logit;
   title2 'Nested Dichotomies';
run;
proc logistic data=wlfpart nosimple descending;

   model fulltime = husinc children ;

   output out=resultf p=predict xbeta=logit;

The output statements create the datasets resultw and resultf for plotting the predicted probabilities and logits. The printed output for the working dichotomy includes:

+-------------------------------------------------------------------+
|                                                                   |
|                              Response Profile                     |
|                         Ordered                                   |
|                           Value  WORKING     Count                |
|                                                                   |
|                               1        1       108                |
|                               2        0       155                |
|                                                                   |
|                 Testing Global Null Hypothesis: BETA=0            |
|                           Intercept                               |
|              Intercept       and                                  |
|  Criterion     Only      Covariates    Chi-Square for Covariates  |
|                                                                   |
|  AIC           358.151      325.733      .                        |
|  SC            361.723      336.449      .                        |
|  -2 LOG L      356.151      319.733    36.418 with 2 DF (p=0.0001)|
|  Score            .            .       35.713 with 2 DF (p=0.0001)|
|                                                                   |
+-------------------------------------------------------------------+

+-------------------------------------------------------------------+
|                                                                   |
|                   Analysis of Maximum Likelihood Estimates        |
|                                                                   |
|              Parameter Standard    Wald       Pr >    Standardized    Odds
|  Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate  |  Ratio
|                                                                   |
|  INTERCPT 1     1.3358   0.3838    12.1165     0.0005            .|   .
|  HUSINC   1    -0.0423   0.0198     4.5751     0.0324    -0.168541|  0.959
|  CHILDREN 1    -1.5756   0.2923    29.0651     0.0001    -0.398992|  0.207
|                                                                   |
+-------------------------------------------------------------------+

The output for the fulltime vs. parttime dichotomy is shown below. Note that nonworking women are excluded in this analysis.

+-------------------------------------------------------------------+
|                                                                   |
|                               Response Profile                    |
|                         Ordered                                   |
|                           Value  FULLTIME     Count               |
|                                                                   |
|                               1         1        66               |
|                               2         0        42               |
|                                                                   |
|  WARNING: 155 observation(s) were deleted due to missing values for
|          the response or explanatory variables.                   |
|                                                                   |
|                  Testing Global Null Hypothesis: BETA=0           |
|                            Intercept                              |
|               Intercept       and                                 |
|  Criterion      Only      Covariates   Chi-Square for Covariates  |
|                                                                   |
|  AIC            146.342      110.495      .                       |
|  SC             149.024      118.541      .                       |
|  -2 LOG L       144.342      104.495    39.847 with 2 DF (p=0.0001)
|  Score             .            .       35.150 with 2 DF (p=0.0001)
|                                                                   |
+-------------------------------------------------------------------+

+-------------------------------------------------------------------+
|                                                                   |
|                   Analysis of Maximum Likelihood Estimates        |
|                                                                   |
|              Parameter Standard    Wald       Pr >    Standardized    Odds
|  Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate  |  Ratio
|                                                                   |
|  INTERCPT 1     3.4778   0.7671    20.5537     0.0001            .|   .
|  HUSINC   1    -0.1073   0.0392     7.5063     0.0061    -0.424867|  0.898
|  CHILDREN 1    -2.6515   0.5411    24.0135     0.0001    -0.734194|  0.071
|                                                                   |
+-------------------------------------------------------------------+

Thus, the full 3-category response has been fitted by two models,

log left ( < Pr ( roman 'working' ) > over < Pr ( roman 'not working' ) > right ) = mark 1.336 - 0.042 % roman 'H$' - 1.576 % roman 'kids'

log left ( < Pr ( roman 'fulltime' ) > over < Pr ( roman 'parttime' ) > right ) = lineup 3.478 - 0.107 % roman 'H$' - 2.652 % roman 'kids'

Note that the second equation gives the predicted log odds for fulltime vs. parttime work conditional on working.

Moreover, we can add the likelihood ratio or Wald tests across the two models, so the overall test of the hypothesis that neither husband's income nor presence of children predicts working status (the 3-level response) has a G sup 2 = 36.42 + 39.85 = 66.27 on 2+2=4 df (p < .0001). Similarly, the hypothesis that husband's income does not predict working status has a Wald-test G sup 2 = 4.58 + 7.51 = 12.09 on 2 df ( p < .001).

Comparison of the regression coefficients in the two sub-models (in relation to the size of their standard errors) indicates why the proportional odds model was not tenable. The proportional odds model requires that the coefficients for husband's income and children in the two models are equal. We can see that both variables have a greater effect on the odds of fulltime vs. parttime work than on the odds of working vs. not working.

As usual, these effects can be seen and interpreted more easily in a graph. The odds of working outside the home decrease as husband's income increase and when there are children present. However, among working women, the odds of fulltime vs. parttime work decrease at a faster rate with husband's income; women with children are less likely to work fulltime.

To construct this graph, first join the separate results datasets into one.

*-- Merge the results datasets to create one plot;
data both;
   set resultw(in=inw)
       resultf(in=inf);
   if inw then do;
      if children=1 then event='Working, with Children ';
      else event='Working, no Children ';
   end;
   else do;
      if children=1 then event='Fulltime, with Children ';
      else event='Fulltime, no Children ';
  end;

Then, we can plot the log odds (or predicted probability) against husband's income, using event as to determine the curves to be joined and labelled (the Annotate data step for the labels is not shown here).

proc gplot data=both;
   plot logit * husinc = event /
        anno=lbl nolegend frame vaxis=axis1;
   axis1 label=(a=90 'Log Odds') order=(-5 to 4);
   title2 'Working vs Not Working and Fulltime vs. Parttime';
   symbol1 v=dot    h=1.5 i=join l=3 c=red;
   symbol2 v=dot    h=1.5 i=join l=1 c=black;
   symbol3 v=circle h=1.5 i=join l=3 c=red;
   symbol4 v=circle h=1.5 i=join l=1 c=black;

Figure 31: Log odds of working vs. not working, and of fulltime work vs. parttime work

Influence statistics and diagnostic plots

In ordinary least squares regression, measures of influence (leverage, Cook's D, DFBETAs, etc.) help you to determine whether individual cases have undue impact on the fitted regression model and the coefficients of individual predictors. Analogs of most of these measures have been suggested for logistic regression. (Some additional problems occur in logistic regression because the response is discrete, and because the leave-one-out diagnostics are more difficult to compute.)

Influence measures & diagnostics

leverage

Measures the potential impact of an individual case on the results, which is directly proportional to how far an individual case is from the centroid in the space of the predictors. Leverage is computed as the diagonal elements, h sub ii , of the "Hat" matrix, bold H ,

bold H = bold X star ( bold X star prime bold X star ) sup -1 bold X star prime

where bold X star = bold V sup 1/2 bold X , and bold V = diag { P Hat ( 1 - P Hat ) } . As in OLS, leverage values are between 0 and 1, and a leverage value, h sub ii > 2 k / n is considered "large"; k = number of predictors, n = number of cases.

residuals

Pearson and deviance residuals are useful in identifying observations that are not explained well by the model. They are the (signed) square roots of the contribution that each case makes to the overall Pearson and deviance goodness-of-fit statistics.

influence measures

Measure the effect that deleting an observation has on the regression parameters or the goodness-of-fit statistics. (The terminology for these measures is not well-established, so I use the terms from the SAS/STAT User's Guide.)

C
CBAR: Analogs of the Cook's D statistic in OLS, these are two standardized measures of the approximate change in all regression coefficients when the i -th case is deleted.
DIFCHISQ
DIFDEV: Approximate measures of the amount Delta chi² by which the Pearson chi² (DIFCHISQ) or the likelihood-ratio deviance chi² (DIFDEV) would decrease if the i -th case were deleted. Values > 4 for either indicate "significant" change (since these are 1 df chi² or squared Normal values).
DFBETAij: Measure the change in the logistic regression coefficient b sub j for predictor x sub j when observation i is deleted.

Printed output

All the influence statistics are printed when the INFLUENCE option is used on the MODEL statement. For example,

proc logistic data=arthrit ;
   model better = _sex_ _treat_ _age_ / influence;

This produces a great deal of output, including the following:

                     The LOGISTIC Procedure
                     Regression Diagnostics
          Deviance Residual               Hat Matrix Diagonal

Case             (1 unit = 0.26)               (1 unit = 0.01)
Number  Value   -8  -4  0 2 4 6 8     Value   0 2 4 6 8  12  16

   1    1.812  | *      |        |    0.089  |          *      |
   2    0.360  |        |*       |    0.031  |    *            |
   3    0.685  |        |  *     |    0.087  |          *      |
   4    0.425  |        | *      |    0.034  |    *            |
   5    0.700  |        |  *     |    0.086  |          *      |
   6    0.488  |        | *      |    0.038  |    *            |
   7    1.703  |  *     |        |    0.084  |         *       |
   8    0.499  |        | *      |    0.039  |    *            |
   9    1.396  |   *    |        |    0.066  |        *        |
  10    0.511  |        | *      |    0.040  |     *           |
  11    1.142  |    *   |        |    0.064  |       *         |
  12    0.523  |        | *      |    0.041  |     *           |
  13    1.234  |        |    *   |    0.065  |       *         |
  14    0.599  |        | *      |    0.051  |      *          |
  15    1.121  |    *   |        |    0.065  |       *         |
  16    0.599  |        | *      |    0.051  |      *          |
  17    1.319  |        |    *   |    0.069  |        *        |
  18    0.640  |        | *      |    0.058  |       *         |
  19    1.319  |        |    *   |    0.069  |        *        |
  20    0.640  |        | *      |    0.058  |       *         |
  21    1.340  |        |    *   |    0.070  |        *        |
  22    1.814  | *      |        |    0.061  |       *         |
  23    1.022  |    *   |        |    0.070  |        *        |
  24    0.529  |        | *      |    0.060  |       *         |
  25    1.449  |        |    *   |    0.078  |         *       |
  26    0.619  |        | *      |    0.053  |      *          |
  27    0.909  |     *  |        |    0.080  |         *       |
  28    0.619  |        | *      |    0.053  |      *          |
  29    1.120  |        |   *    |    0.141  |                *|
  30    1.846  | *      |        |    0.052  |      *          |
  31    1.309  |        |    *   |    0.092  |          *      |
  32    0.647  |        | *      |    0.050  |      *          |
  33    0.955  |    *   |        |    0.070  |        *        |
  34    1.803  | *      |        |    0.049  |      *          |

Diagnostic plots of influence measures

Plots of the change in chi² (DIFCHISQ or DIFDEV) against either leverage or predicted probability are useful for detecting unduly influential cases. The actual influence of each case on the estimated coefficients (C or CBAR) can be shown in a bubble plot where the plotting symbols are circles proportional to C or CBAR.

Such plots are produced by the INFLOGIS program. For example, these lines produce plots of DIFCHISQ against both the leverage (HAT) and predicted probability (PRED) using bubbles whose area is proportional to C:

title 'Arthritis treatment data';
title2 'Bubble size: Influence on Coefficients (C)';
goptions htext=1.6 ftext=duplex;
%include inflogis;
%include arthrit;
%inflogis(data=arthrit,
     y=better,               /* response   */
     x=_sex_ _treat_ age,    /* predictors */
     id=id,
     gy=DIFCHISQ,            /* graph ordinate  */
     gx=PRED HAT,            /* graph abscissas */
     lcolor=RED, bsize=14
     );
run;

The printed output from the INFLOGIS program includes a table identifying any observation of high leverage or influence. For example, case 29 is of high leverage, because she is unusual in terms of the predictors: a young woman given treatment; however, she is not influential in the fitted model. Case 77, is not of high leverage, but is poorly predicted by the model and has large contributions to chi² .

CASE BETTER  _SEX_  _TREAT_  AGE   HAT  DIFCHISQ   DIFDEV     C

  1     1      0       1      27   .09    4.5781   3.6953   0.4510
 22     1      0       0      63   .06    4.4603   3.5649   0.2898
 29     0      1       1      23   .14    1.0183   1.4005   0.1679
 30     1      1       0      31   .05    4.7485   3.6573   0.2611
 34     1      1       0      33   .05    4.2955   3.4644   0.2236
 55     0      1       1      58   .03    4.9697   3.6759   0.1602
 77     0      1       1      69   .03    8.4977   4.7122   0.2758

Figure 32: Influence plot for arthritis data. Cases with DIFCHISQ > 4 or leverage > (2 k) / n = 0.095 are influential, as indicated by the size of the bubble symbol. Fig

Figure 33: Changes in chi-square vs. predicted probability. The plot shows that most of the influential observations are those with very high or low predicted probabilities. The systematic pattern shown is inherent in the discrete nature of logistic regression.

[Previous] [Next] [Up] [Top]

Contents