
Page 3
Causal Models to Address Interactions
The concept of interactions was introduced earlier  for when one factor modifies the relation between another possible causal factor and the outcome variable. Examples were explored, such as water supply interacting with sanitation such that effects of water on wt/age are different depending on latrines. Earlier in this chapter multiple associations between education levels, use of piped water, access to toilet facilities, and wt/age were investigated. One result found was that the relation between wt/age and piped water was quite different, in that dataset, depending on educational level. It was suggested that you could explore the interaction that this represents  and we will leave that possibility open and look at a similar one, as an illustration. Summary findings are shown now, and details of Sanitation's Effect on Nutrition can be seen by CLICKING HERE.
The research question is:
Does the effect of access to toilet facilities on wt/age depend on education level, and if so does this persist controlling for socioeconomic status?
Access to toilets by itself (in keast4j) is associated with wt/age, as seen in model 1 of the previous regression table. To investigate the interaction, the interaction term is created as usual as a new variable, by multiplying DLOWEDN * NOTOILET, and entering this into the model along with the education and toilet access variables. The results are summarized as model 1 in the table below. This shows that the interaction is significant, so that the effect of toilet access on wt/age does depend on education level. The shape of this effect is usefully plotted out, which is often sketched by hand, and the way to calculate is to use the equation given by the model (1 in the table, note that the intercept is not shown in the table, but can be seen in the detailed description in the subfile you can click into a few lines above):
WAZ = 0.958  0.791*(NOTOILET)  0.524*(DLOWEDN) + 0.834*(NOTOILET*DLOWEDN)
Substituting in the equation for NOTOILET = 0 or 1, DLOWEDN = 0 or 1, gives the four points to sketch a graph of wt/age (yaxis) against access to toilets (yes/no) on the xaxis, for 2 categories of education. If you haven’t done this before, try Graphing Regression with Interaction now, and check results by CLICKING HERE.
Dependent variable = WAZ
Independent variable  Model number: coefficient (p, t) 

1 
2 

Cases included  All 
All 
NOTOILET  0.791 (3.009,0.003) 
0.766 (2.562, 0.011) 
DLOWEDN  0.524 (5.340, 0.000) 
0.421 (4.181, 0.000) 
NOTOILET * DLOWEDN 
0.834 (2.781, 0.006) 
0.912 (2.712, 0.007) 
VINCOME   
0.409 (4.894, 0.000) 
Adj R sq  0.043 
0.070 
N  697 
636 
We could examine this pattern by tabulation, as will be done a bit later, and control for socioeconomic status (SES) by subdividing the table by a categorical socioeconomic variable such as housing (e.g. DBADROOF). If the sample size is large enough and the cells are not too unbalanced, this will work well. However, if we are lucky enough to have a continuous variable measuring SES, like income, then we would have to split that into two or more categories, then subdivide the table by those. This wastes information, and is not the best way. It is better is to enter the variable into the model and observe what happens to the coefficients. Remember that the research question concerns the interaction controlling for SES, so we need to go this route. The results are in model 2 of the table above, using the derived ‘virtual income’ variable for demostration purposes as discussed earlier.
The coefficients in model two hardly change with VINCOME in the model; in other words, the modifying effect of education does not appear to be due to some association with ‘income’, and the effect of toilet access remains. Note that we are now considering three independent variables, and one interaction, and it is becoming more difficult to envisage what the associations might mean in reality  but if you go to the relations as sketched in the ‘spider’, for example, they do make sense.
Usually when drawing conclusions from mulivariate analyses, and especially when dealing with interactions, it is important to go back and forth between regression, tabulation, and graphical methods. Next, let’s look at the table showing the relative effects of education and toilet access; we won’t worry about significance, because we’ve already seen that in the regression results. We will, however, include prevalence as well as mean waz, as this will give a better sense of the size of the effects we are finding. By running ‘compare means’ by layers in SPSS (as an example) and extracting the results we can construct a table like this: For the Mean Outcome Scores for Sanitation and Education CLICK HERE.
Education level  Access to toilet 
Mean WAZ 
Prevalence <2SDs (%) 
N 
Higher  Yes 
0.958 
15.2% 
264 
No 
1.749 
50.0% 
22 

Lower  Yes 
1.482 
36.8% 
326 
No 
1.440 
40.7% 
86 
The first important thing this tells us is that the interaction we are seeing in the regression is due to a strong difference in the higher education group, but this includes a small cell of only n=22. Thus most of the differential effect is due to the very low wt/age in this small group. The fact that the group is small is not itself surprising, as this would be a somewhat unusual combination of better education with no toilets  and the number in the cells are likely to be unbalanced because of the collinearity between education and toilet access. Therefore, although the interaction is significant, in terms of policy or program design (or indeed analytically), it is not very important, due to (or driven by) a rather small group (22, only 3% of the sample).
The effect of education remains important, for the sample with n’s of 264 and 326 (i.e. remains controlling for toilet access)  but that was not the question we were researching.
The other side of the coin is important too: among the larger and worseoff group with lower education, little or no effect of toilet access appears to occur. We would have little basis therefore for focusing on toilet access, in this population, as a means of improving nutrition.
While we are at it, let’s look at the equivalent tabulation result for piped water (which is a suggested exercise for regression analysis):
Education level  Piped water 
Mean WAZ 
Prevalence <2SDs (%) 
N 
Higher  Yes 
0.736 
8.8% 
57 
No 
1.109 
20.1% 
229 

Lower  Yes 
1.385 
32.3% 
31 
No 
1.480 
38.1% 
381 
The interaction is again apparent, and this time for a larger group, in the higher education category. But we will still have to look further for an intervention likely to benefit the lower educated population (apart from education itself). But back to the research question here ...
Does the effect of access to toilet facilities on
wt/age depend on education level, and if so does
this persist controlling for socioeconomic status?
To confirm that the effects observed are not due to confounding by SES, as a further step we should probably examine the slope of the relationships within categories of education (and toilet access for the education effect), while controlling for SES by having VINCOME in the equation. This gives the following results, added to those in the earlier table:
Independent variable  Model number: coefficient (p, t) 

1 
2 
3 
4 

Cases included  All 
All 
Better educated 
With toilet access 
NOTOILET  0.791 (3.009,0.003) 
0.785 (2.606, 0.009) 
0.780 (2.763, 0.006) 
NA 
DLOWEDN  0.524 (5.340, 0.000) 
0.452 (4.466, 0.000) 
NA 
0.459 (4.673, 0.000) 
NOTOILET * DLOWEDN 
0.834 (2.781, 0.006) 
0.926 (2.738, 0.006) 
NA 
NA 
VINCOME   
0.365 (4.392, 0.000) 
0.304 (2.449, 0.015) 
0.326 (3.746, 0.000) 
Adj R sq  0.043 
0.066 
0.040 
0.068 
N  697 
638 
264 
549 
This confirms the significance of the factors within the betteroff categories.
It should be obvious that a great many other associations can be examined in this way  which reemphasizes the crucial importance of specifying what you need to know and sticking with that question until you get an answer. As illustrated here, several different techniques  multiway and simple, regression, tabulation, and graphical methods  will be applied to any one question. Don’t rush to judgement  get a feel for the answers, and work towards a balanced set of conclusions.
ANOTHER MULTIWAY TO STUDY INTERACTIONS
It might be useful now to go through one more example of a multiway analysis to look at causality. As a continuation on the previous research topic the same question will be addressed concerning immunization and nutrition status, but with a more extensive level of analysis. With the help of multiple variable analysis, we will be able to explore deep enough to provide strong support for causality. There are certain contraints when working with a cross sectional dataset, but it is possible to explore more extensively with multiple variable regression so that the link between the independent variable of interest and the outcome is more clearly established. Here is the research question we will address:
Does
immunizing for measles improve children’s nutrition status independent of influencing
factors such as the education of mother, mother’s height, age and sex of the child?
One of the first steps in testing for causality is drawing out the relationships between variables of interest and factors that might influence the relationship. For example, we will continue the example with measles immunization and nutrition status, but now we will consider other factors that might influence the model such as the education level of the respondent, the nutrition status of the respondent (mother's height), and the age of the child (which was considered in depth in the twoway analysis).
To visualize the relationships under question, build a spider. This time because more variables are involved in the model and the Spider will be more complex. CLICK HERE to take a look at the spider for education, respondent’s height, measles immunization and age of child for the determining variables of nutrition status.
Before beginning some detailed regression analysis or running ANOVA tables for each independent variable and the outcome, it is somewhat insightful to run a correlation, which can be seen by CLICKING HERE.
All correlations show both the size and direction of the association. The correlations (see the first column Pearson’s correlation coefficient) are each in the direction expected, with the exception of WAZ and measles immunization. This reason was looked at in detail in the twoway analysis, showing age as a confounding factor for measles and WAZ. The solution to uncovering the true relationship is to break the analysis group into smaller age groups. We can begin to see where we will need to watch for collinearity between certain independent variables, such as mother's height and low education.
Now we will use linear regression analysis, which is designed to test a model of associations between potential causal variables and an outcome. The strength of a regression analysis is its capability of testing multiple variables at once. Therefore, regression is very apropos for this section of the learning package.
It is important to remember to think through the model clearly before entering variables, it is not useful to start tossing any variable. The purpose of testing by regression analysis is to determine the relationships and sizes of predicted relationships, not to build random models that have not been framed by a logical question.
The method used for running a regression model is called ENTER, which automatically places all of the chosen variables in the model at once, accounting for the influence of each variable on the outcome. One important step to remember is to test the interactions between the determining variables during this process. For example, it was clear previously that age and measles immunization have a strong interaction variable and also show a confounding effect, which makes it necessary to analyze stratified age groups to control for the effect of age confounding.
In order to test if interaction is present, it is best to start by including interaction variables for the pairs of variables that are suspected to have an effect on one another. This is especially true for variables that have been reported as influencing one another in the literature or that have physiologic evidence or other indication of interacting. The goal with testing interaction is to place them in the model and test if they change the size of the effect for the other variables in the model, particularly the variable of interest. Also look to see if they are significant in the model, and if they are not and there is not high suspicion of influence, then it should be dropped from the model. The relationships should be logical. A test can be run using GLM to see if there are significant interaction variables with a variable pair given the outcome of interest, but don’t go overboard. Just test interactions that are sensible and likely to occur.
TAKE A LOOK at how to run a GLM (general linear model) to test interaction.
When running a regression analysis, it is necessary to actually create the interaction variables before they can be entered (they are not an automatic option as they are in GLM). Creating interaction variables is quite simple in SPSS by using the Transform, Compute option and multiplying the two variables of interest together. Before we begin further analysis, create an interaction term for measles and age as well as for low education and mother's height.
Here are the steps taken for the analysis.
1. Now begin by running a FULL REGRESSION MODEL with all of the variables including the interaction variables.
The result is confusimg due to age confounding. Since the problem is similar, try to resolve the confounding through stratification again.
2. Limit the sample to only 1236 MONTH old children and run the same variables in the model.
3. Now limit the sample to 010 MONTH old children and run the same variables in the model. This will predominantly show that the problem is limited to the younger age category for many previously mentioned reasons. First, children 11 months and less do not often get immunized, and additionally children at this age still have immunologic protection from the mother. It is known that there will be significant differences between children under a year of age and over 1 year due to strong differences in the influencing factors on these two groups, feeding and immunology. It is well documented that analyzing the less than 1year and over 1 year separately will show the influence of immunization.
Results:
These two agestratified regression analyses show the results the groups 1236 months and 010 months. In these strata, the results are extremely different from those of the whole sample together. Now the confounding of age has been unmasked. In the 1236 month strata, it is that measles immunization does have a profound effect on the nutrition status of the child, even above the influences of all other variables in the model. For those 010 months, the effect of immunization is very strong even though the group of children immunized is very small (approximately 15%). It is likely that in this group there is an association with access to health clinic care or better care taking overall. The influence of measles immunization at this young age might reflect other advantages since it is known that young children maintain immunologic protection from the mother through conferred immunity. It is more interesting and useful based on knowledge of physiology to focus on the influence of measles immunization in the children over 1 year of age. It is encouraging to see that the model does have reasonable results for the direction and size of the effects for education, height of the respondent and measles immunization.
Independent Variable  010 months 
1236 months 

Bvalue 
t 
Sig. 
Bvalue 
t 
Sig. 

Education  0.318 
1.448 
NS 
0.361 
2.627 
0.009 
Height of respondent  0.0006 
0.335 
NS 
0.005 
4.036 
0.000 
Measles immunization  2.288 
2.022 
0.045 
1.430 
2.052 
0.041 
Now that there is a reasonable model for the effect of measles immunization considering specific influences of socioeconomic status, the information must be presented in a way that is easily understood by the people that can provide for programs and policies to change the situation. A tabular analysis can be produced from the data for presentation purposes. Tabular analysis is also a way to check the congruency of using the categorical outcome (underweight <2 SD WAZ) and using the continuous outcome (Mean WAZ score) in the presentation. It is not always possible to get consistent results when the outcome data is looked at categorically versus continuously, but most often they will show similar results. Categorical outcomes are most often used for presentation because it is easier to understand. When they are consistent, it is much easier to promote change when a number can be given to the malnourished group. It is not the best way to analyze data, but it does help promote change for policy makers. Here is how to make a tabular analysis for the multiplevariable analysis.
Take a look at the exercise on creating the TABULAR ANALYSIS in SPSS. The results in the tabular analysis are generally consistent with that in the regression analysis using continuous WAZ outcome data. This will make it easy to present the data for use for promoting measles immunization. Here is how to put this into a more attractive presentation format. Probably the best way to present the data is within categories as follows:
Prevalence of low WAZ for categories of Measles
immunization for children 12 to 36 months of age with
education and height of the mother
Independent Variables 
Measles Immunization 
Mean Prevalence of low WAZ (<2 SD) 
N 
Std. Deviation 
EDUCATION: Low (Primary or less) Not low (Primary +)

Yes No Yes No 
47 % 59 % 24 % 40 % 
136 29 104 5 
.5010 .5012 .4294 .5477 
RESPONDENT'S HEIGHT: Low (<=157 cm) Not low (>157 cm) 
Yes No Yes No 
42 % 67 % 32 % 38 % 
127 21 113 13 
.4951 .4830 .4680 .5064 
TOTAL:  Yes No 
37 % 56 % 
240 34 
.4840 .5040 
TOTAL: 
39 % 
274 
.4896 
Here, the effects of measles immunization are stronger for those with lower education status or low height of the mother. The more poignant statement to make is that immunization appears to play a significant role in decreasing malnutrition for children after the age of one, which is seen regardless of these SES measures. These data is supported by the regression analysis and through previous studies that provide evidence that measles vaccination provides significant benefits to a child's health. This fits into the model that shows a cycling effect of malnutrition and illness in the child. If measles prevents illness, then there is less chance that a child will become ill and lose weight, which makes them more susceptible to illness again (and the cycle continues).
This exercise is only one of endless multivariate analyses that are possible. The same techniques and thought should be applied to any analysis that is run. A wellplanned and researched question should be asked and present knowledge in nutrition should be used to step through the data analysis. Just remember to always keep the question in mind and enjoy the process of finding the answers in the data.