Confounding
We have learned how to explore binary relationships--
relationships between two categorical variables.
Often, however, a third variable can influence our estimate of
the relationship between the original two. We will explore
various examples of this.
Example One
Consider a hypothetical example concerning receipt of proper
health
care and mortality. We want to determine if those who received
appropriate care have a greater chance of survival than those who
did not.
Approp Care Dead Alive Total Death Rate
No 22 378 400 .055
Yes 7 293 300 .023
Total 29 671 700 .041
Odds ratio = 2.44
Relative Risk = 2.36
The expected values for the chi square test of independence are
shown below
Expected values for chi square test of independence
Approp Care Dead Alive
No 16.6 383.4
Yes 12.4 287.6
The value of the test statistic is 4.33. There is one
degree of freedom, and the p value is less than .05. Reject the
null hypothesis and conclude that there is a relationship between
receiving appropriate care and mortality. Note that the expected
value for the no care dead cell is 16.6 while the observed value
is 29. There are more observations in that cell than the model
of independence predicts. Thus, inappropriate care seems to
increase
the risk of death. The odds ratio reflects this since it is
greater
than one.
The data was pooled from two clinics. The data is shown
below for each clinic separately.file
Clinic One Clinic Two
App App
Care Dead Alive Total Rate Dead Alive Total Rate
No 6 194 200 .030 16 184 200 .080
Yes 4 246 250 .016 3 47 50 .060
Total 10 440 450 .023 19 231 250 .076
Chi sq 1.00 0.23
Odds Ratio 1.90 1.36
Relative Risk 1.88 1.33
The mortality rate at the second clinic is higher than at
the first. Also, there were very few mothers who received care
at the second clinic. These clinic differences make it seem that
care is effective when we analyze the data without taking clinic
into account. When the two clinics are analyzed separately,
there is no relationship between care and mortality. Clinic is
called a confounder. .
In order to be a confounder a variable must be related to
the predictor and to the outcome. The following two tables show
that this is true for clinic. It is related to mortality and to
appropriateness of care.
Death Appropriate Care
Clinic Yes No Total No Yes Total
One 10 440 450 200 250 400
Two 19 231 350 200 50 300
Total 29 671 700 400 300 700
Chi sq = 11.70 Chi sq = 82.96
Example
Confounders do not always show themselves in this way. In
the second example there is a relationship between the predictor
and the outcome both with and without controlling for the
confounder.
The second example relates aspirin use to incidence of
Reyes' Syndrome. This study is discussed by Halpin, et al
(1982).
Case Control Total
Asp 94 70 164
No asp 3 27 30
Total 97 97 194
Odds ratio 12.09
The expected values for the chi square test of independence
are shown below.
Expected Values for Chi Square Test of Independence
Case Control
Asp 82 82
No asp 15 15
The test statistic equals 22.71. There is one degree of
freedom, and the p value is less than .001. There were 94
children in the case/aspirin cell, but the expected value was
only 15. The odds ratio equals 12.09. The relationship is very
strong, and aspirin increases the risk of Reyes' Syndrome.
An alternative explanation for this relationship is that
there are two disease processes with one being more severe than
the other. The more severely ill children were both more likely
to have developed Reyes' Syndrome and to have been given aspirin.
We explore this explanation by including a possible
indicator of severity of illness, presence or absence of fever.
Fever No Fever
Case Control Total Case Control Total
Asp 73 41 114 21 29 50
No asp 1 14 15 2 13 15
Total 74 55 129 23 42 65
Chi sq 17.84 4.15
p value LT .001 .04
Odds ratio 24.93 4.71
In both those with fever and those without there is a
positive relationship between aspirin use and development of
Reyes' Syndrome.
It appears that the relationship between case status and
aspirin use is stronger for those children with a fever than for
those without. Further analysis did not show that this was true,
but it may be due to small numbers in some cells. In any case,
when the nature of the relationship between two variables differs
among values of a third variable, the third variable is called an
effect modifier. We will discuss this in a later session.
The following two table show that fever status is related to
both the predictor and the outcome.
Case Control Total Asp No Asp Total
Fever 74 55 129 114 15 129
No Fever 23 42 65 50 15 65
Total 97 97 194 164 30 194
Chi sq = 8.35 p<.01 Chi sq = 4.33 p<.05
Halpin, T., Holtzhauer, F., Campbell, R., Hall, L., Correa-
Villaseror, A., Lanese, R., Rice, J. and Hurwitz, E. (1982)
Reyes'
syndrome and drug use. The Journal of the American Medical
Association, 248(6): 687-691.
In both examples we have conducted stratified analyses. We
have analyzed data from various levels of a potential confounder
or effect modifier separately. For many years this was the only
option available for the analysis of categorical variables. The
current approach combines the data into a single sample and
includes the potential confounder as a variable in the analysis.
The data is analyzed using loglinear or logistic regression
modelling.
References
Halpin, T., Holtzhauer, F., Campbell, R., Hall, L., Correa-
Villasenor, A., Lanese, R., Rice, J. and Hurwitz, E. (1982)
Reyes syndrome and drug use. The Journal of the American
Medical Association, 248(6); 687-691.
Return to the resources for session 9
Return to the syllabus
Return to the home page
© J.Rice