Multi-way Regression: A Closer Look at Water and Sanitation

To run the complete analysis for modeling the effect of water and sanitation on nutrition status, then a logical progression must be made by starting simple with one variable (e.g., sanitation -notoilet, water source- dpiped, dwell) and then adding variables into the model one by one.  Use the following exercises to run the regression models.

Model 1:  Look at only toilet with the outcome WAZ score.

1.  Open keast4j.sav

2.  Click on Statistics, Regression, Linear...

3.  Enter the Dependent variable waz from the variable list using the arrow key.

4.  Enter the Independent(s) notoilet from the variable list using the arrow key.

5.  Select the model type as Enter and click on OK.

INTERPRETATION:

The size of the effect of not having access to a safe toilet is large enough to merit attention (B=-0.255), and it is significant (p=0.044).  But this step is just a first sighting and must not be taken as truth until further exploration is pursued using other independent variables that we have discussed (e.g., water source, education, roofing quality).  Try the next step and see how the coefficient for notoilet changes.

Model 2:  Look at only piped water then only well water with the outcome WAZ score.

1.  Open keast4j.sav

2.  Click on Statistics, Regression, Linear...

3.  Enter the Dependent variable waz from the variable list using the arrow key.

4.  Enter the Independent(s) dpiped (... then run again with only dwell)  from the variable list using the arrow key.

5.  Select the model type as Enter and click on OK.

Piped water...

Well water...

INTERPRETATION:

Only the model for piped water shows any significant influence in the model (B=0.369, p=0.007), whereas well water does not appear to have a sizeable or significant association with the outcome variable.  We will now continue by running a model with no toilet and piped water to see how the coefficients are affected by the combination.

Model 3:  Look at no toilet and piped water with the outcome WAZ score.

1.  Open keast4j.sav

2.  Click on Statistics, Regression, Linear...

3.  Enter the Dependent variable waz from the variable list using the arrow key.

4.  Enter the Independent(s) notoilet and dpiped from the variable list using the arrow key.

5.  Select the model type as Enter and click on OK.

INTERPRETATION:

Now you can see the change in the coefficients from the previous models.   The size of the coefficient for toilet has decreased slightly from -0.255 to -0.209 and is no longer significant in the model (p=0.101) at the 0.05 level.  The variable for piped water still has a large coefficient, although it has decreased slightly as well from 0.369 to 0.335. Piped water still seems to be significant in the model to test possible causal factors for malnutrition.  So, what are we not looking at yet that might still be influencing the model?  The question we asked was what is the effect of water and sanitation above and beyond the influences of education, therefore we should try adding education in the model and then see how the coefficients behave. We have previously looked at education alone and found it has a strong association with waz score, therefore we will skip on to a full model with toilet, water, and education together.

Model 4:  Look at no toilet and piped water with the outcome WAZ score.

1.  Open keast4j.sav

2.  Click on Statistics, Regression, Linear...

3.  Enter the Dependent variable waz from the variable list using the arrow key.

4.  Enter the Independent(s) notoilet, dpiped, and dlowedn from the variable list using the arrow key.

5.  Select the model type as Enter and click on OK.

INTERPRETATION:

Now you can see the change in the coefficients for water and sanitation.   It is clear that there is uncertainty as to whether or not either of these have an influence in nutrition status any longer.  Our worry is that they actually do, but we are not detecting it due to the strong collinearity with education.  We would be more certain about the results if the effect was independent of education, but since we have previously detected collinearity, it might be best to now look deeper by looking within education groups to see what the behavior is for toilet and water source.  You have done this in previous exercises using the routine called Select if..., which is quite simple.  Here is how to select for the low education group and then how to run the model again.

Model 5:  First select for dlowedn=1 (the low education group) and then run a model for notoilet and piped water.

1.  Open keast4j.sav

2.  Click on Data, Select Cases... and then click on the dot for If condition is satisfied and click the box for If...

3.  Move the variable dlowedn into the white box using the arrow key and type in =1, so that the box reads dlowedn=1.

4.  Click on Continue and then make sure the dot by Unselected Cases are is on FILTERED (very important!!!!).

5. Click on OK.

Now that your cases are selected for low education group (<primary).

1.  Open keast4j.sav

2.  Click on Statistics, Regression, Linear...

3.  Enter the Dependent variable waz from the variable list using the arrow key.

4.  Enter the Independent(s) dpiped from the variable list using the arrow key.

5.  Select the model type as Enter and click on OK.

INTERPRETATION:

Now the result with only the low education group is run for only piped water.  If you were to check with no toilet in the model as well, you would have found the same problem of interaction between the variables and no effect from either.   When looking at piped water and waz score for only the low education, there is a still not a significant coefficient and it is quite small anyhow.  There might be no effect of water source for those that are not educated.  It could make sense to have a result like this since those who do not have a basic level of education might not know how to properly use a better water source.  Lets look further to see if there is a different result for the group that does not have low education.

Model 6:  First select for dlowedn=0 (not low education group) and then run a model for notoilet and piped water.

1.  Open keast4j.sav

2.  Click on Data, Select Cases... and then click on the dot for If condition is satisfied and click the box for If...

3.  Move the variable dlowedn into the white box using the arrow key and type in =0, so that the box reads dlowedn=0.

4.  Click on Continue and then make sure the dot by Unselected Cases are is on FILTERED (very important!!!!).

5. Click on OK.

Now that your cases are selected for better education group (<primary).

1.  Open keast4j.sav

2.  Click on Statistics, Regression, Linear...

3.  Enter the Dependent variable waz from the variable list using the arrow key.

4.  Enter the Independent(s) dpiped from the variable list using the arrow key.

5.  Select the model type as Enter and click on OK.

INTERPRETATION:

Here is the result for those that do not have low education (dlowedn =0),  which is clearly different from the same model run for those that do have low education.  The variable for piped water now has a coefficient far larger (0.353 compared with 0.096) than in the previous model. This fits with the idea that those with better education would be more affected by improved water because they might be more inclined to use it properly.  But what if this is just another red herring and it is actually just a difference in socio-economic status in this higher education group that is causing this result to appear.  It might be a good time to test the model by adding a variable to control for socio-economic status such as roofing quality (dbadro) to control for this.  Try it now.

Model 7:  First select for dlowedn=0 (not low education group) and then run a model for piped water and roofing quality.

1.  Open keast4j.sav

2.  Click on Statistics, Regression, Linear...

3.  Enter the Dependent variable waz from the variable list using the arrow key.

4.  Enter the Independent(s) dpiped and dbadro from the variable list using the arrow key.

5.  Select the model type as Enter and click on OK.

INTERPRETATION:

Well, this model did change the coefficient for piped water, but not drastically and the size of the coefficient is still large with a significant p-value (B=0.344, p=0.042).   I would believe that this would be support for causality when looking at the effect of water source on nutrition status.  This would likely be an effect seen only in those with a basic level of education (>primary).

BE SURE TO GO BACK THROUGH THE SELECT IF ROUTINE and SELECT ALL!!!!