Page   4

Confounding

'dealing with alternative explanations’

 

Mathematically, possible confounding is said to exist when two or more independent variables (which might confound each other) are associated both with each other, and with the dependent variable of interest. Since determinants of nutritional status are usually linked -- better educated people have better housing, access to health services, water and sanitation, and so on -- if we are interested in the relationship of one of these we have to control for the others.   And we are often interested in isolating the effect of one, usually because we are looking for guidance as to which specific interventions may improve nutrition.

We looked earlier at the relative contributions to weight-for-age of education and housing quality (roofing, as a proxy for socio-economic status, SES), and interpreted this table:

Mean WAZ Score by Education and Roofing Quality

(parenthesis represent the number of individuals in the category)

Educational Attainment (respondent) Roof Quality Bad (grass/ thatch) Roof Quality Good (corrugated iron) TOTAL
Bad (less than primary) -1.57 (170) -1.40 (241) -1.47 (411)
Good (primary +) -1.28 (59) -0.99 (225) -1.02 (284)
TOTAL -1.46 (229) -1.20 (466) -1.29 (695)

 

The question was whether education was associated with wt/age taking SES into account. The table can be read by examining the differences in wt/age z-score between education levels, within categories of housing quality. Thus we would compare -1.57 (poor education) with -1.28 (good education)  within the poor roofing category, and -1.40 (poor education) with -0.99 (good education) within good roofing category. In fact, we could test for the significance of these differences by various means -- one would be to select for one housing category and run an ANOVA for the differences between mean WAZ by education groups; even simpler would be to aggregate to two groups of education level and do a t-test within category of roofing.

We would conclude that there is an association between education level and WAZ within housing categories (strata), and therefore this association was probably not simply due to better educated people being better off in SES terms, but might well be a direct effect of education itself. In other words, we have now examined possible confounding of the relationship between WAZ and education by SES; we have controlled for housing in examining this association.

There is another type of confounding that may catch us out, which is when some unexpected association is the real explanation. This can be more difficult as there may be an alternative situation that we haven’t thought about. An example is used below, in which measles immunization is found (as expected, when one comes to think about it, since measles immunization should be at around 9 months of age) to be strongly associated with child’s age; but WAZ decreases with age, as usual after 6 months or so; therefore if all children from 0 - 5 years are included it appears that measles immunization is associated with poorer nutritional status. We know enough by now not to draw any conclusion on causality -- this is an association -- but it would also be surprising because better off children are more likely to get immunized anyway. Removing the effect of age, for example by selecting children of one year old or more, gets rid of this potential confounding, as will be seen below.

The principle here to keep in mind is that unexpected confounding needs some imagination to identify, by definition almost. One way is to put yourself in the shoes of a picky critic — think of someone who is always trying to catch you out — and envisage what arguments might be used to undermine the key findings that you are getting. Then examine the possible relationships and see if the alternative explanation has some validity, and proceed from there. In the case of the age-measles problem, excluding the under-12-month olds is one approach. (Another, which we will look at later, is to differentiate this group by using a dummy variable for under-12-months).


WAYS TO COPE WITH CONFOUNDING FACTORS

If caught in time for prevention, confounding can be possibly avoided by selecting the proper study sample. Sometimes using group matching or individual matching can help eliminate confounding factors, restricting the study to a homogenous group, or randomly assigning individuals to different experimental groups (depending on the study design).

Often it is not possible to alter study design since the data has already been collected by the time the analyst is analyzing it. Here are some options that might still work:

It is often possible to control for confounding by stratifying the data (as we have just done) and using the sub-group (strata) specific results for reporting or for supporting the conclusions. This is what is used with measles and age, since sub-analysis is performed using smaller age groups in the next lesson. It works!
Standardization or Adjusted rates are statistically constructed to account for differences in the groups with respect to the variable(s) of interest.
Use dependent variables that neutralize the effect of confounders, such as using RESIDUALS as the dependent variable.

Abramson, JH Making Sense of Data. New York, Oxford University Press, 1994.


Does Education have an effect on Nutritional status independent of  SES?

An important method of controlling for roofing in tackling the research question is by looking at the association within categories or strata. This is not just a simple method as a start, it is a fundamental approach that should usually be applied when there are two independent variables involved. It becomes clumsy and uneconomical however if there are more than two independent variables, as is often the case. It is possible to stratify by three variables, especially with a large sample size: for example, distinguishing mother's age, education of the respondent, and roofing like this:

Prevalence of Malnutrition (<-2 SD WAZ) by Educational Attainment, Mother's Age, and Roofing Quality

  Education Low (<primary) Education Not Low (primary +)  
  Mother young (<=30 years) Not young (>30 years) Mother young (<=30 years) Mother not young (>30 years)  

TOTAL

Roofing Good

36%

(146)

34%

(95)

19%

(162)

8%

(63)

26%

(466)

Roofing Bad

44%

(87)

40%

(83)

23%

(44)

33%

(15)

38%

(229)

TOTAL

39%

(233)

37%

(178)

20%

(206)

13%

(78)

30%

(695)

 

Here you can begin to see patterns of association, where those with bad roofing are worse-off in general and those with poor education are worse-off.  But when you start sorting through all the sub-groups, it becomes a little more difficult to discern what the relationships are, and especially which ones are significant enough to yield your attention. 

More often, regression is used to control for potential confounding by more than one independent variable, and in fact this approach is common when only one potential confounder is examined. (We could use ANOVA, but regression is easier, more common, and is certainly the method of choice when there is more than one potential confounder to examine). We will therefore look at controlling for potential confounding using regression in two-way analysis, both as a common method here, and as an introduction to using regression in this manner for multi-way analysis.

Essentially we are examining whether a significant regression coefficient for the variable of interest remains significant, and does not alter greatly in size, when potential confounders are also entered into the model. Thus, again, it is important to stick to the research question, which helps define which are the variables we are examining, and for what reason. Recall that our research question is:  Does Education have an effect on WAZ, regardless of SES?

Dependent variable = WAZ

Independent

Variable

Model number: coefficient (t, p)

1

2

3

4

DLOWEDN -0.448 (-4.887, 0.000) -0.411 (-3.714, 0.000) -0.420 (-4.456, 0.000)
DBADROOF -0.253 (-2.594, 0.010) -0.136 (-0.780, 0.436) -0.158 (-1.608, 0.108)
EDN_ROOF -0.033 (-0.155, 0.877)
N 698 694 694 694
Adj R sq 0.032 .008 0.033 0.034

 

The programming details of the regression analysis for education and roofing in Kenya were given when we ran this example in the previous section on Regression.  Now we look at it in the light of confounding and see that there is a decrease in the regression coefficient when we add in roofing and education.  We see that the coefficient is not drastically reduced and that it remains significant in the model.  There is also no interaction between the two variables, therefore this term was dropped from the final model.  It could be most useful to notice that the education variable is significant in the model no matter which model is being considered and that the coefficient is not drastically altered at any point, which shows that education is a strong player in a model for nutrition status.  The tabulation shown at the beginning of the section depicted the behaviour of the outcome variable by education level as being the same for each group of roofing categories.  Although the effect of education (coefficient) is not drastically altered when roofing is added, you can still clearly see that those with poor roofs are worse off than those with better roofs, which is why we controlled for this as a confounder. Education survived the test, and now we can learn to build on a larger, more robust model in multi-way analysis. But first, let's look at an example of confounding that is not what we might expect.


UNEXPECTED CONFOUNDING

Research Question: 

Does measles immunization appear to significantly improve the nutritional status of a child?

When the association of underweight with whether or not the child is immunized for measles is run, the following results are obtained:

Measles immunized? Mean waz Prevalence of <-2SDS N
Yes -1.43 33% 503
No -0.92 22% 177

 

CLICK HERE to see programming details for running these mean outcome scores.

Thus the prevalence of underweight is higher in those children that are immunized. Does this mean that the immunization is having a negative effect on nutritional status? Is the immunization programme well targeted towards the worse-off children? Could something else be happening that is confounding the apparent relationship? Give this a few moments thought, about what explanations, unexpected or otherwise, there may be for this finding. Sketch some tables or other analyses that could be used to investigate the hypotheses ... they can even be run using the dataset in the usual way.

The attempt to determine what might be causing this unusual result led to a regression analysis by age stratification. The results look like this:

Dependent Variable WAZ Score

Regression Models in Age Strata: coefficient (t*,p)

Variables in the Model 1

AGE= 6-11 mo.

2

AGE= 12-24 mo.

3

AGE= 12-36 mo.

4

AGE = 0-60 mo.

Measles 0.244 (0.684, 0.496) 0.993 (2.901, 0.004) 1.656 (2.312, 0.022) -0.875 (-4.800, 0.000)
Age -0.366 (-3.915, 0.000) 0.020 (0.738, 0.462) 0.047 (1.798, 0.073) -0.027 (-5.111, 0.000)
MEAS_AGE 0.049 (-1.789, 0.075) 0.024 (3.822, 0.000)
Sample N 76 132 273 679
Adj. R squared 0.173 0.050 0.017 0.067

*t-statistic – similar to an F-statistic used to compare the difference in two means, but a t-test is computed especially to adjust for a small sample size.

This was a case of unexpected confounding with age, where a result was found that did not fit our construct (we expect measles immunization to associate with better nutritional status, not worse) so we broke the analysis into age categories to see if age was actually so highly associated with both measles immunization and nutritional status that we could not see the results when the entire group was looked at together.  It was successful to stratify and look at the age categories of 6-11 months (although few are immunized in that group) and 12-24 as well as 12-36 month olds.  It is much clearer that there is a positive protective effect of immunization for children especially after 1 year of age. 

TAKE A LOOK at how to do the programming for the Regression Analysis Stratified by Age Categories. In this case, the best means of controlling for the confounding effect of age was stratifying into age categories and analyzing the data in each age group.


HOW THE SPIDER LOOKS

Below a spider diagram represents the relationship between measles immunization and age as potential causal factors for nutritional status and AGE as the Confounding factor.

wpe4.jpg (24865 bytes)

               


The next step to take in analyzing data is to look at more than just two factors in the model. It will often be the case that there is a more complex model needed to answer the research question of interest. For example, in this scenario it might also be interesting to control for differences in access to health care and SES on measles, in addition to controlling for age in the model.  It could also be insightful to  look at the influence of a program activity beyond the influence of SES and sanitation, for example. In this case, more variables need to be tested than just two. Next is MULTIPLE VARIABLE ANALYSIS that will step into more advanced analysis.

Return to top of page