Confounding 

     We have learned how to explore binary relationships--
relationships between two categorical variables.  
Often, however, a third variable can influence our estimate of
the relationship between the original two.  We will explore
various examples of this.

Example One

     Consider a hypothetical example concerning receipt of proper
health
care and mortality.  We want to determine if those who received
appropriate care have a greater chance of survival than those who
did not.                      

             Approp Care      Dead    Alive  Total   Death Rate
                    No          22     378    400       .055
                    Yes          7     293    300       .023
                    Total       29     671    700       .041

             Odds ratio = 2.44
             Relative Risk = 2.36

The expected values for the chi square test of independence are
shown below

          Expected values for chi square test of independence

         Approp Care      Dead    Alive
                  No      16.6    383.4
                 Yes      12.4    287.6

     The value of the test statistic is 4.33.   There is one
degree of freedom, and the p value is less than .05.  Reject the
null hypothesis and conclude that there is a relationship between
receiving appropriate care and mortality.  Note that the expected
value for the no care dead cell is 16.6 while the observed value
is 29.  There are more observations in that cell than the model
of independence predicts.  Thus, inappropriate care seems to
increase
the risk of death.  The odds ratio reflects this since it is
greater
than one.
     The data was pooled from two clinics.  The data is shown
below for each clinic separately.file

                Clinic One               Clinic Two
App                                App
Care  Dead   Alive   Total Rate    Dead   Alive   Total  Rate
  No     6    194     200  .030      16     184     200  .080
 Yes     4    246     250  .016       3      47      50  .060
Total   10    440     450  .023      19     231     250  .076

Chi sq                     1.00                          0.23    

Odds Ratio                 1.90                          1.36
Relative Risk              1.88                          1.33 


     The mortality rate at the second clinic is higher than at
the first.  Also, there were very few mothers who received care
at the second clinic.  These clinic differences make it seem that
care is effective when we analyze the data without taking clinic
into account.  When the two clinics are analyzed separately,
there is no relationship between care and mortality.  Clinic is
called a confounder.  .
     In order to be a confounder a variable must be related to
the predictor and to the outcome.  The following two tables show
that this is true for clinic.  It is related to mortality and to
appropriateness of care.

             Death                     Appropriate Care
    Clinic  Yes   No  Total              No    Yes   Total  
     One     10  440   450              200    250    400
     Two     19  231   350              200     50    300
    Total    29  671   700              400    300    700
           Chi sq = 11.70                 Chi sq = 82.96

Example

     Confounders do not always show themselves in this way.  In
the second example there is a relationship between the predictor
and the outcome both with and without controlling for the
confounder.
     The second example relates aspirin use to incidence of
Reyes' Syndrome.  This study is discussed by Halpin, et al
(1982).

                 Case     Control   Total
          Asp     94        70       164
          No asp   3        27        30
          Total   97        97       194

          Odds ratio                12.09

     The expected values for the chi square test of independence
are shown below.

    Expected Values for Chi Square Test of Independence
              Case    Control
       Asp     82       82
       No asp  15       15       

     The test statistic equals 22.71.  There is one degree of
freedom, and the p value is less than .001.  There were 94
children in the case/aspirin cell, but the expected value was
only 15.  The odds ratio equals 12.09.  The relationship is very
strong, and aspirin increases the risk of Reyes' Syndrome.
     An alternative explanation for this relationship is that
there are two disease processes with one being more severe than
the other.  The more severely ill children were both more likely
to have developed Reyes' Syndrome and to have been given aspirin.
     We explore this explanation by including a possible
indicator of severity of illness, presence or absence of fever.

              Fever                  No Fever
          Case  Control  Total   Case    Control   Total
   Asp     73      41      114     21       29       50
   No asp   1      14       15      2       13       15
   Total   74      55      129     23       42       65

   Chi sq                 17.84                     4.15
   p value                LT .001                    .04
   Odds ratio             24.93                     4.71

     In both those with fever and those without there is a
positive relationship between aspirin use and development of
Reyes' Syndrome.
     It appears that the relationship between case status and
aspirin use is stronger for those children with a fever than for
those without.  Further analysis did not show that this was true,
but it may be due to small numbers in some cells.  In any case,
when the nature of the relationship between two variables differs
among values of a third variable, the third variable is called an
effect modifier.  We will discuss this in a later session.
     The following two table show that fever status is related to
both the predictor and the outcome.

              Case  Control Total     Asp   No Asp Total
    Fever      74     55    129       114      15   129
    No Fever   23     42     65        50      15    65
    Total      97     97    194       164      30   194
            Chi sq = 8.35 p<.01       Chi sq = 4.33 p<.05

Halpin, T., Holtzhauer, F., Campbell, R., Hall, L., Correa-
Villaseror, A., Lanese, R., Rice, J. and Hurwitz, E. (1982)
Reyes'
syndrome and drug use.    The Journal of the American Medical
Association,  248(6):  687-691.

     In both examples we have conducted stratified analyses.  We
have analyzed data from various levels of a potential confounder
or effect modifier separately.  For many years this was the only
option available for the analysis of categorical variables.  The
current approach combines the data into a single sample and
includes the potential confounder as a variable in the analysis. 
The data is analyzed using loglinear or logistic regression
modelling.

References

Halpin, T., Holtzhauer, F., Campbell, R., Hall, L., Correa-
Villasenor, A., Lanese, R., Rice, J. and Hurwitz, E. (1982)
Reyes syndrome and drug use.  The Journal of the American
Medical Association, 248(6);  687-691.


Return to the resources for session 9

Return to the syllabus

Return to the home page © J.Rice