FS Home

Section 1:  Introduction
Section 2:  Coping Strategies
Section 3:  Computing
Section 4:  Analysis Ex. (HLS Bangladesh)
Section 5:  Analysis Ex. (HLS Kenya)

 

bullet.jpg (717 bytes) One-Way Analysis    bullet.jpg (717 bytes) Targeting    bullet.jpg (717 bytes) Two-Way Analysis    bullet.jpg (717 bytes) Regression


Performing analysis with two independent variables is often the first step of determining if an association is likely to be causal or is confounded.  Various components that contribute to the outcome (say food sufficiency in months) need to be indentified and understood in realtions to one another.  Tabular analysis and presentation of data can be used to present the meaning of more complex associations.   Considerable progress can be made by analyzing simple two-by-two tables. (Note: these are not cross-tabulations – they have the outcome variable in the cells.)

In the tabular presentation in this learning package, the outcome or the categorical outcome (months of food sufficiency/storage) can be used to show the particular groups that have better food sufficiency (such as those with better irrigation or cultivation).

An example of tabular presentation, using irrigation and land cultivated in relation to months of food sufficiency, shows the simplicity of this style.

Months of Food Sufficiency    p-value - <0.001

       <1-3 Hectares                            3- >10 Hectares

Totals

No 1.76
(78)
2.01
(44)
1.85
(122)
       Yes

3.66
(245)

4.80
(45)

3.84
(290)

  Totals

3.20
(323)

3.45
(89)

3.25
(412)

This table shows that the level of food sufficiency is greatest when both irrigation and cultivation are maximized, giving a total outcome of 4.8 months of food sufficiency. Increasing cultivation alone, regardless of irrigation status does little to improve a household’s level of food sufficiency  --  improving from 1.76 month to 2.01 months or from 3.66 to 4.80 months.  However, improving irrigation regardless of cultivation category more than doubles their months of provisioning from 1.76 to 3.66 months or 2.01 to 4.80 months. The p-value is <.001, which is highly significant, and confirms the association. Of course, further analysis must be done in order to tease out whether this is a true relationship or whether it is related to geography, socio-economics, or occupation (agricultural vs. pastoral).

Another example of a two-way analysis is illustrated below, using weight for age z-scores as the outcome variable.

Weight for Age Z-score
p-value = 0.015

Water Education Totals

                    Poor                                     Good

Unsafe

-1.53
(314)
-1.06
(171)
-1.26
(485)

Safe

-1.30
(98)

-0.96
(115)

-1.12
(213)

Totals

-1.47
(412)

-1.02
(286)

-1.29
(698)

This table shows that children with parents of low education and with an unsafe water source have the worst z-scores (indicating poor health status). The corresponding p-value only compares scores between water sources, but is significant at .015. The numbers tell the story though: from low to high education there is a 0.45 z-score difference, and from unsafe to safe water sources a 0.24 z-score difference. Now, analysis needs to be conducted to control for other factors, such as geography and SES.

Standard deviations, or z-scores, and a mean score from a continuous outcome variable (such as months food sufficiency) can be used in the cells of the table for analysis, though this might not be as effective in communication as displaying the actual numbers. For instance, it is easier to talk to someone about the numbers of children that are malnourished in areas without latrines instead of the difference between the mean z-scores of those with and without latrines. A common approach is to use the continuous variable for analysis, which preserves information and is more powerful, and present results with prevalence data, which is more easily understood.

The following table presents the corresponding prevalence data for the continuous data (z-score) above.

Prevalence < -2 standard deviations WAZ
Water Education Totals

                    Poor                                     Good

Unsafe

39.5%
(314)
20.5%
(171)
32.8%
(485)

Safe

31.6%
(98)

13.9%
(115)

22.1%
(213)

Totals

37.6%
(412)

17.8%
(286)

29.5%
(698)

One can more easily understand the magnitude of the situation by looking at a table such as this, rather than trying to interpret z-score values. This presentation also makes it quite clear where the emphasis for targeting of interventions should be directed.

Earlier, in one-way analysis, it was shown how one determining factor could be related to the outcome; for example, differences in health service by education or roofing. An important question is whether education really affects health service access, or whether those with better education are of higher SES, and thus could be proxied by housing quality. To examine this, it is crucial to make the research question explicit:

Is there a difference in health service access for those with higher education than those with lower education attainment independent of SES (estimated by roofing) in eastern Kenya?


Analysis of variance (ANOVA) is used to examine the association of a continuous outcome -- such as months of food sufficiency -- with two determining variables. First, these associations should be looked at individually, as described under one-way, although often this will be repeated to refresh one’s view. The ANOVA routines in statistical software (like SPSS) are also convenient at this stage to provide an early look at possible interactions and to provide output tables that can easily be interpreted and presented.

We examine two determining (independent) variables to probe whether likely causal factors are confounded,
i.e., whether education is likely to cause better access to health service, or whether this may be a spurious relationship
because better educated people have better housing and socio-economic status, in general. Note that these analyses are conservative: it can easily happen that there is some genuine association which does not appear to persist when taking account of possible confounding, because of correlations between the two (or more) independent variables. Thus the approach is geared more towards NOT finding associations if there is doubt (avoiding type I errors, in statistical terms). Sometimes one
has to conclude that you cannot tell whether or not there is a genuine relationship because of these correlations between
independent variables (‘multicollinearity’). But confounding is so common that usually conclusions cannot be reached
without investigating whether important confounding exists -- these steps in analyses are hardly optional.

Anov2.jpg (28261 bytes)

The output shows the difference in access to health service for varying levels of education and type of roofing material (SES proxy). In this case, those with better education and/or better roofing appear to be associated with an increase in access to health service. Those with the better education and roofing have the greatest access to health service, which indicates that greater access to health services is probably associated with higher SES. Unfortunately, we cannot perform an ANOVA on this tabular analysis because the outcome variable is categorical (the step was included in the computing instructions, but not displayed here).

Important note: Significance vs. Size of the Difference

Remember though, the essence of the analysis is to find the effect of the variable on the outcome. So, keep in mind that the significance (p-value) is not as important as the size of the difference. The significance can confirm the association, but it is not necessary to see significance if the difference is large because significance is very much driven by sample size.