Page  3

Exploring Associations

Most nutritional analysis uses associations between variables in one way or another.  This applies to targeting and to possible causality; hence, influencing intervention designs. Associations can first be studied as one-on-one (or one-way) -- referring to the association of a dependent (or outcome) variable with an independent  (or determining, or classifying) variable.   We can begin to get a feel for the structure of the data at this stage, using simple tabulations.  More than that, valid associations usually show up for the first time with such tabulations. If they don't, they are unlikely to appear magically at a later stage (although that can happen....and is interesting when it does).

More complex analysis is mostly concerned with further investigation of the validity of simple associations, what modifies them, when and how they occur, and so on.  (In most cases the associations have to be there in the first place.)  Experience shows that a good place to start, with anthropometry as the outcome, is looking at associations with variables like water supply and sanitation, or maternal education and housing quality, to name a few examples.  These variables are often associated with anthropometric data.  If no association exists, there may be doubts about the dataset, but first make sure the dataset is clean before proceeding!  Keep in mind exploring simple associations is the first indispensable step to analysis, but don't rush to conclusions on the basis of raw associations --  there's a long way to go before you can make confident inferences for program design (or evaluation). 


Looking at associations in the data can be done simply by comparing values of the outcome indicator, such as WAZ or prevalence underweight, within two or more categories of a possible determining or causal variable. For example, one could look at the prevalence of low arm circumference in two categories of water supply - 'good water' and 'bad water'. If there is a difference in low arm circumference between the two, it may point to the possibility that poor water supply is actually causing malnutrition. [In fact, to make this type of association better understood much of the multivariate analysis shown later is needed.] Here is an example:

Water supply                    Prevalence of low arm circumference
Poor                                    25% (n=233)
Better                                  15% (n=142)

The above example looks at differences in the outcome variable(s) by classes of the independent variables. Often this requires creating a new variable, which splits or groups the independent variable into two or more classes (dichotomizing if two) -- such as good and bad water supply.

Here, an association between low arm circumference and water supply exists due to the difference between the prevalence percentage of the classifying (or determining, or independent) variable -- water supply in the example above.

There is a crucial distinction between the size of the difference, and its statistical significance. Usually the size of the difference is more important -- remembering that the significance will always depend on the sample size. The larger sample the sample size, the more likely the significance is SIGNIFICANT (i.e. p <0.05).

Try an exercise to learn how to SET DICHOTOMOUS VARIABLES or multi-variable categories.


Once dichotomous or multiple variable categories have been created, they can easily be used as categories to show differences in mean nutrition status. Click Here for an exercise in examining differences in outcome (WAZ) within independent variable categories from the Kenya dataset and to see the results in Mean Scores and ANOVA tables.

The mean outcome (mean WAZ score) tables for categories of roofing, water supply, education level of the mother and latrine type are shown here.

wpe15.jpg (7428 bytes)

wpe17.jpg (7812 bytes)

wpe19.jpg (8434 bytes)

wpe1B.jpg (8812 bytes)

The results for all of the variables come out with the expected associations, and the differences for roofing, education and sanitation are all significant (p-value < 0.05).  The one non-significant association is that for water source, mostly influenced by the small cell size for the group of districts that have safe water (n=33).  The size of the difference is still large, and would be the primary interest here (-0.98 SD WAZ vs. -1.30 SD WAZ). Try placing the results in a table for presentation that shows the overall picture of water and sanitation on nutrition status (WAZ). 


The environmental and socioeconomic characteristics are presented for the area of interest, in this case Eastern Kenya. (Presentation by district would also be a good way to show differences between areas). The outcome variable used is the mean WAZ score (measure of nutritional status)for each category (e.g. mean WAZ for good water source versus bad water source). The presentation can be easily displayed in the following format to facilitate interpretation:

wpeA.jpg (23322 bytes)


The categories, although they do not control for other influencing factors, do show some pattern in the distribution of the data. In the table: (1) The mean z score is given to show the difference in the distribution by category of an independent variable, (2) The number in the sample (n) is given for each category to show both distribution and total sample size, and (3) The significance (Pearson’s significance between mean scores) is given to help determine if the difference between the mean scores is due to a chance occurrence.

In all of the SES and environmental indicators chosen, there is a higher mean weight for age z-score in the 'good' category than in the 'bad' category. For all indicators including Education, Water, Sanitation and Roofing, only water source has a significance that is above the usual cut-off (p = 0.147). This category might still have a noteworthy difference, but could be affected by the small distribution in the category "protected drinking source." Remember the importance of the size of the difference and not so much the significance (for significance is greatly affected by sample size and the sample size for water source is small).

These results cannot be conclusive since other influencing variables are not controlled for in these outcomes.  The distribution can show patterns to indicate how the factors are affecting the outcome and how these factors might influence other health factors of interest such as health access or behavior.

Associations are a beneficial way to see patterns in the outcome distributions and should be done in the early stages of data analysis. These associations will give light to targeting priorities and possible influencing factors that are beyond a program's scope that might need to be taken into account in program planning.


Here, unlike the above table, the Mean WAZ and Underweight Prevalence are both presented and the size of the difference in included in the table, as well as the significance of the difference.  (These results are from Kenya Rift Valley dataset.)

Table: One-Way Analysis

Mean WAZ

Underweight Prevalence (%)


Toilet Access

p = 0.035

p = 0.025

No access to toilet




Access to toilet




Difference (b/n worse and best off)








p = 0.000

p = 0.001

Thatched or other roof




Corrugated iron or tiles




Difference (b/n worse and best off)







Drinking water source

p = 0.064

p = 0.292





pub. Tap and well w/o pump




piped to res. And well w/ pump




Difference (b/n worse and best off)







Educational attainment

p = 0.006

p = 0.005

none or incomplete 1o




complete 1o or more




Difference (b/n worse and best off)







Delivery location

p = 0.000

p = 0.001

Homes or other




Public or private sector




Difference (b/n worse and best off)








Toilet access, roof (proxy for SES), educational attainment and delivery location are all significantly associated with both mean WAZ and underweight prevalence.  The differences in mean WAZ and prevalence are also shown in the table, for example, the prevalence of underweight in persons living in poor housing is 9.7% higher than in those with a corrugated iron or tiled roof. Ideal program interventions target something that can easily be impacted by a program. As water and sanitation programs are common and usually have an impact on weight for age z score/underweight prevalence one would want to explore these associations further.

Return to Top of Page