Page  1

General Approach for Program Content

Considerable progress can be made by analyzing simple two-by-two tables. (Note: these are not cross-tabulations – they have the outcome variable in the cells.)  Tabular analysis and presentation of data can be used to present the meaning of more complex associations.  Clear presentation of results is also essential for good communication.

This will appear again soon with a lesson using actual data. In the tabular presentation in this learning package, the outcome or the categorical outcome (prevalence) can be used to show the percent that are in a particular group (such as malnourished or stunted, etc.).

An example of tabular presentation using water supply and latrines (sanitation) in relation with the outcome low arm circumference (malnutrition) shows the simplicity of this style.

 

Prevalence of Underweight Children (<-2 SD WAZ) by Latrine and Water Source Quality

(Figures in brackets are numbers underweight / total number in cell)

Latrines - Poor Latrines - Better Total
Water supply - Poor

12%
(96/ 800)

7%
(49/ 700)

10%
(145/ 1500)

Water supply - Better

12%
(72/ 600)

4%
(16/ 400)

9%
(88/ 1000)

Total

12%
(168/ 1400)

6%
(65/ 1100)

9%
(233/ 2500)

In this hypothetical case, improving latrines appears to be associated with a 6-percentage point decrease in underweight prevalence. There is only a one-percentage point improvement in underweight associated with water supply improvement unless combined with latrine improvement (10% to 9% in the Total column). This implies a synergistic effect with water supply and latrine improvement.

Standard deviations, or z-scores, and a mean score from a continuous outcome variable (such as waz) can be used in the cells of the table for analysis, though this might not be as effective in communication as displaying the actual numbers. For instance, it is easier to talk to someone about the numbers of children that are malnourished in areas without latrines instead of the difference between the mean z-scores of those with and without latrines. A common approach is to use the continuous variable for analysis which preserves information and is more powerful, and to present results with prevalence data, which is more easily understood.

Earlier, in one-way analysis it was shown how one determining factor could be related to the outcome – for example, differences in underweight by education or housing, etc. An important question is whether education really affects nutritional status, or whether those with better education are of higher SES and could be proxied by housing quality. To examine this, it is crucial to make the research question explicit:

 

Is there a difference in nutritional status for those with higher education than those with lower education attainment independent of SES (estimated by roofing)?

 

Analysis of variance (ANOVA) is used to examine the association of a continuous outcome -- like weight-for-age expressed as a SD- or z-score (often abbreviated to WAZ) -- with two determining variables. First, these associations should be looked at individually, as described under one-way, although this will often be repeated to refresh one’s view at this point (as shown here). The ANOVA routines in statistical software (like SPSS) are also convenient at this stage to provide an early look at possible interactions and to provide output tables that can easily be interpreted and presented.

We examine two determining (independent) variables to probe whether likely causal factors are confounded -- whether, in our example here, education is likely to cause better nutrition, or whether this may be a spurious relationship because better educated people have better housing and socio-economic status in general. Note that these analyses are conservative: it can easily happen that there is some genuine association which does not appear to persist when taking account of possible confounding, because of correlations between the two (or more) independent variables (see ‘confounding’, later). Thus the approach is geared more towards NOT finding associations if there is doubt (avoiding type I errors, in statistical terms). Sometimes one has to conclude that one cannot tell whether or not there is a genuine relationship because of these correlations between independent variables (‘multicollinearity’). But confounding is so common that normally conclusions cannot be reached without investigating whether important confounding exists -- these steps in analyses are hardly optional.

In formal terms, ANOVA is a statistical technique that shows the contribution of categorical independent variables (e.g. roofing type) to the variation in the mean of a continuous dependent variable (e.g. Nutritional status WAZ scores). Observations of the independent categories, such as different roof types, are classified separately and mean values are calculated for each category. These mean scores for each category are tested to determine if there is a statistical difference between the mean outcome (e.g. z-scores) for each category (good roof versus bad roof).

Tabulation (compare means function in SPSS) together with the ANOVA option is used to generate tables and test differences.   In the example, the mean values of weight for age by roofing type, and then by education level, are first estimated separately and then together (further instruction on the programming is available after the output is displayed).  This produces tables like these:


Individually: Mean outcome scores by a single independent variable

For roofing:

wpe2.jpg (13579 bytes)

The mean z-score for the grass/thatch (lower income indicator) is much lower than that of the corrugated iron roof group (higher income indicator). The mean z-score for low SES is -1.46 and high SES is -1.20, which corresponds in the right hand column to a prevalence (<-2 SD WAZ) of 38% and 26%, respectively.  The ANOVA results show that this difference is significant, as the p= 0.01. 

 

For Educational attainment of the respondent:

wpe2.jpg (23405 bytes)

wpe1.jpg (14688 bytes)

Running the analysis table by educational attainment gives the results shown here, which are in the expected direction. There is a decreasing prevalence of malnutrition in each higher educational category, and an increasing mean weight for age z-score.  The largest difference in the results is seen between the group that did and the group that did not complete primary level education (-1.1 to -1.5 SD WAZ and 20% to 37% prevalence of underweight).  The ANOVA results support that the difference between malnutrition levels in different educational groups is significantly different (p=0.000).  Recall, the F statistic is a value resulting from the ANOVA to determine if the variances between the means of the
two populations are statistically significant.

Important note: Significance vs. Size of the Difference

Remember though, the essence of the analysis is to find the effect of the variable on the outcome. So, keep in mind that the significance (p-value) is not as important as the size of the difference. The significance can confirm the association, but it is not necessary to see significance if the difference is large. Significance is very much driven by sample size.

Together: Mean outcome scores for combined categories

wpe1.jpg (26478 bytes)

wpe6.jpg (19466 bytes)

Within each educational group, there is a pattern in the expected direction of mean z-scores by roofing type, where individuals with grass/ thatch have lower mean z-scores than those with iron roofs. Similarly, mean z-score increases (malnutrition decreases) as education level increases . Both education and roofing quality appear to be associated with malnutrition in the expected direction, but it appears that education has the stronger association.

The ANOVA table shows that for education after consideration of roofing quality, there is a significant difference detected between the nutritional status of the education categories (F=7.226, p=0.000), where those with more education are better off nutritionally. In contrast, the significance of roofing quality after the consideration of educational attainment does not remain significant (F=2.266 and p=0.133), although this in no way indicates that roofing quality (as an indicator of SES) is not an important consideration in a model for nutritional status.  This data does show some worthwhile results, therefore the next step is to present the information.

CLICK HERE to see programming details for the Kenya example of Mean Outcome by Independent Variables.


PRESENTING THE DATA

A step-by-step exercise is provided to put the data in a more usable form. Running the same exercise with WAZ prevalence <-2 standard deviations instead of WAZ score in Standard deviations,  produces a 2-way table. It is important for the purpose of ANALYSIS to use the MEAN z-scores when working with nutritional data (this preserves nutritional information to use the continuos outcome variable rather than to categorize the z-scores).  It is, however, far more effective for PRESENTATION to use the PREVALENCE of malnourished (% below –2 SD on a z-score).

ANALYSIS of Nutritional Status ---------------------->  Mean Outcome (e.g. Z-score)

PRESENTATION of Data for Policy makers ------------------> Percent below a Cut-off                                                                                                     (e.g. % <-2 S.D.)

When the output from the tabular exercise is presented, care must be taken to extract the relevant information and put it in a table that is both easy to read and provides a clear message. The tabular output from comparing means in SPSS is shown below, followed by the presentation table with the same information, but  put into a more friendly and attractive format. 

The presentation format

 

Main Roof Material

Educational Attainment of the Respondent

No Education

Incomplete Primary

Complete Primary

Secondary

TOTAL

 

Grass Thatch

 

45%

(25 / 58)

 

40%

(45 / 112)

 

22%

(8 / 36)

 

30%

(7 / 23)

 

38%

(86 / 229)

 

Corrugated Iron

 

35%

(20 / 57)

 

35%

(63 / 184)

 

19%

(22 / 116)

 

13%

(14 / 109)

 

26%

(120 / 466)

 

TOTAL

 

40%

(45 / 115)

 

37%

(109 / 296)

 

20%

(30 / 152)

 

16%

(21 /132)

 

30%

(206 / 695)

The table rearranges the data into a 2-way formation where the internal number represents the percentage of malnourished children (<-2 standard deviations waz). By showing the % of malnourished instead of the difference between the mean Z-scores, one can more effectively promote action planning and policy level decisions in groups that are not as familiar with nutritional terminology. It is important to look for both the gross patterns in the marginals (for example, the overall difference in education groups and the overall difference in roofing types) and the patterns within the subgroups (for example, to detect the change in education within each roofing type). It supports that there is an effect of education independent of roofing type (or SES).

CLICK HERE to see programming details on running outcome tables for Presentation.


Another Example of Two Independent Variables with the Prevalence of Low Arm Circumference at the District Level in Bangladesh

 

Using the Bangladesh district level data set, the following question is posed for two-way analysis:

Is there a difference in nutritional status (measured as the prevalence of low arm circumference) between districts with different latrine types and water quality?

The goal is to examine the association of the prevalence of low arm circumference for two determining variables, and use the two-way ANOVA to test the significance of the association with these two variables, water and latrines. Questions will often arise where there is more than one possible intervention that is proposed and the effect of both needs to be measured together.

A tabulation for arm circumference by latrine and water source quality should be calculated individually then together.  Take a look at the differences in the prevalence of low arm circumference when the two variables are shown individually, and then look further down when they are shown together.  Do you start to see some change in the pattern of the results, where the prevalence pattern changes depending on the level of the other independent variable? 

wpeC.jpg (21739 bytes)

INTERPRETATION:

The first output table of the exercise will give the summaries of the variables used including the (N) number for each good / bad group of the independent variables. The second table provides the descriptive statistics for water and latrine, showing the mean prevalence of low arm circumference for each category and sub-category of water and latrine. This data is aggregated (at the district level), which means that where we use to see (in Kenya) the mean outcome scores of a continuous variable, we now see the prevalence of the outcome (in this example, prevalence of low arm circumference as an estimation of malnutrition). Keep this in mind and take a look at the results below. 

The final table shows the results of the ANOVA for the two independent variables. Both the water source and latrine source have a significant level of association with the prevalence of low arm circumference among children (Sig= 0.013 and  0.007 respectively). The variable labeled WATHCAT*LATCAT is an interaction term, which tests for an association between water and latrines to determine if the association with low arm circumference is overlapping. In this case the interaction DOES appear to be significant!  The Significance level of  0.10 indicates that the effect of water and latrine together is not just by chance but of a level that is worth giving some attention.  This is just the tip of the iceberg for interaction…more will be covered in the following section on regression analysis.


PRESENTING THE DATA:

 

c4p1g2.jpg (49326 bytes)

 

INTERPRETATION:

The table shows the marginal prevalences (e.g. total column and total row) calculated from numbers in cells, or by weighting the cell prevalences by their n values.

The MARGINALS for water supply show a 5% reduction in the prevalence of low arm circumference, and similarly the marginal for latrine use shows a 4% reduction. Despite the marginal differences, the patterns within the subgroups provide some important new  information. Within the poor latrine group, an improvement in water supply reduces the prevalence of low arm circumference by 7% whereas the reduction within the better latrine group is only 1%. Does it appear that providing better water to those with good latrines will improve nutritional status much? Not really, according to this data. This is an example of interaction, which will be discussed next. Similarly, a reduction of 6% in the prevalence of low arm circumference is seen when those with poor water are given improved latrines, but less than 0.5% reduction is seen when those with safe water sources have improved latrines. This would make it seem that either water or latrines should be improved, but when they are both provided the additional impact of the second improvement is small or insignificant. In this case, it appears the program might see more improvement in children’s health by providing either water or latrines and putting the additional funds toward other programs.

 

IMPORTANT NOTE, as illustrated here:  When there is INTERACTION detected, it is NOT very useful to interpret the MARGINALS (Interaction can be misleading)! Only look at the internal cells to interpret the effects, for example the change due to water improvement within those with poor latrines.

 

CLICK HERE to see the details of the analysis in Bangladesh using Two Independent Variables (latrine and water source quality) to predict prevalence of low arm circumference.


For output that is presented with categorical data such as waprev (1= malnourished or <-2 SD, 0 = not malnourished >=-2 SD) and a categorical independent variable such as roofing type, the statistical significance of the difference between the groups can be tested using CHI-SQUARE ANALYSIS, although it is not particularly useful in the PANDA package. 

 

Return to top of page