| FS Home |
Section 5: Analysis Example (HLS Kenya) |
| Section 1: Introduction | |
| Section 2: Coping Strategies | |
| Section 3: Computing | |
| Section 4: Analysis Ex. (HLS Bangladesh) | |
| Section 5: Analysis Ex. (HLS Kenya) |
ANALYSIS EXAMPLE: HLS BASELINE SURVEY IN W KENYA: [1].
In 1999, CARE began developing a project to improve livelihood and food security in W Kenya. To help in this a baseline survey was carried out. Some examples of possible analyses in relation to decisions on project design are described below, and the dataset is included for those who would like to practice these analyses, or take them further, as a way of learning the techniques.
Description of the survey [2]
Site of the study
The survey was conducted in Western Kenya, in three districts of Nyanza province: Suba, Rachuonyo, and Homa Bay district. The areas covered by the three districts stretch across two agro-ecologic zones, characterized generally as low and medium potential zones (CARE-Kenya, 1996). They are defined by two essential features: (1) a unimodal rainfall pattern - mostly between March and July - although there are scattered, short and unreliable rains throughout the year, and (2) by "black cotton" clay soil, which are sticky during the wet season and tend to stiffen and crack during the dry season (CARE-Kenya, 1996). These two features combined tend to limit the potential of traditional agriculture in the zone.
Population density in the area varies, from low to high, but average educational attainments are generally low, even compared with the rest of Kenya. The state of the infrastructure is extremely poor. Travel is difficult in the dry season, with the roads riddled with deep gullies left from the rainy season. This is still better than the wet season, when the roads are practically impassable. As a result, access to the main market centers, which are on average at about three kilometers from most of the villages, is very difficult. Moreover, the poorest areas lack medical services, with the result that those seeking medical care must walk one to two hours to nearby towns (CARE-Kenya, 1996).
Sample size
The calculation of sample size for the survey was based on the formula for indicators expressed as proportions. This is the preferred method for sample size calculation if the progress of a project is to be measured by changes in the proportion of the population with a given characteristic, such as the percentage of infants up to six months who are exclusively breastfed, or the proportion of malnourished children. The formula is presented below (with a key to explain the symbols), as follows:
n = D [(Za + Zb) 2 (P1(1 - P1) + P2 (1 - P2)) / (P2 - P1) 2]
KEY:
n = required minimum sample size per survey round or comparison group;
D = design effect, which provides a correction for the loss of sampling efficiency resulting from the use of cluster sampling instead of simple random sampling. A design effect of 1.3 has been established for Western Kenya on the basis of prior DHS and CARE's rapid assessment, and was used for calculating the sample size;
P1 = the estimated level of an indicator measured as a proportion at the time of the first survey or for the control area;
P2 = the expected level of the indicator either at some point in the future or for the project area, such that the quantity (P2 - P1) is the size of the magnitude of change desired for detection;
Za = the z-score corresponding to the degree of confidence desired for concluding that an observed change of size (P2 - P1) would not have occurred by chance alone (a is the level of statistical significance; it is frequently set at .95 for most social projects);
and Zb = the z-score corresponding to the degree of confidence required to detect a change of size (P2 - P1) if one actually occurred (b is the statistical power).
The sample size calculation used the following parameters:
D= 1.3
P1 = 0.30 (a conservative estimate of stunting was used)
P2 = 0.20 (the expected percentage change is 10%)
Za = 0.95
Zb = 0.80
The sample size thus calculated per cluster was as follows:
N= 1.3 [(1.645 + .840)2 * (.30(.70) + .20(.80))/(.10)2] =297
To obtain the total sample size, the size per cluster was multiplied by 4 - corresponding to the 3 intervention sites and 1 control site - for a total of 1188. Owing to cost and feasibility considerations, the rule-of-thumb 10% cushion usually added to buffer against non-response was not added, but the sample size was rounded up to 1200 households.
Selecting the sample
Cluster sampling and systematic sampling were used to select the households included in the survey (table 1). The sampling frame included the three districts of Suba, Homa Bay and Rachuonyo. While all three districts had project sites, financial constraints limited the selection of controls to Suba district alone.
At the first stage of sampling, a list of sampling units, or sub-locations (serving as clusters), was selected using systematic sampling. Sub-locations were chosen as clusters because they have relatively well defined physical boundaries, are located reasonably close to one another, have moderate and approximately equal measures of size (villages) and are reasonably homogenous. A determination was made to select forty clusters from the three districts - ten each from Homa Bay and Rachuonyo, and twenty from Suba, which was sampled twice because it is the largest of the three districts, with reasonable dispersion to allow for the selection of uncontaminated controls along with the project participants. To improve sampling efficiency, clusters at the first stage were selected with probability proportional to estimated size, which resulted in an actual total of thirty- nine clusters selected, one of which was to be sampled twice at the second stage on account of its larger size.

At the second stage of sampling, one village was selected randomly from each cluster (two from the larger cluster), for a total number of forty villages. At the final stage of sampling, a modification of the segmentation method was used to select the households surveyed. The segmentation method involves dividing a segment into smaller segments of roughly equal size, randomly selecting one segment and sampling all households in the chosen segment. In this instance however, there was no accurate listing of the households in the villages, or a reliable mapping of their spread within the cluster.
The solution applied was to create segments (clusters of households) based on information provided by the Assistant Chiefs and their staff, and select one segment randomly. The enumerators were instructed to proceed to the adjacent segment if there were less than thirty households in a particular, randomly selected segment. As it turned out, each segment selected contained slightly more than thirty households, thus negating the need for proceeding into the next segment.
The Survey Instrument
The questionnaire was designed to collect information needed to track indicators related to the various components of CARE's household livelihood security project. The average duration for administering it, based on pre-survey trial runs, was 87 minutes. Besides general household information, the questionnaire contained modules on child health and nutrition, adult health, education, water and sanitation, agricultural productivity and asset ownership, and participation in civil society. The questions were made up of both open-ended questions and closed questions intended to capture specific information on given variables.
Implementation
Enumerator training
An intensive, weeklong training of enumerators was undertaken prior to the survey. The training covered the basics of enumeration, including approach, introduction, consistency of questioning, observation and callback procedure. Additionally, there was a participatory translation and back-translation of the questionnaire instrument from English to Luo, the language of the participants. Trainers provided demonstrations of the measuring tools used (weight and height/length scales), and a pre-testing of the instrument for cultural appropriateness, flow and suitability.
Conducting the survey
Four survey teams, each consisting of six enumerators, two supervisors and a driver, were formed for the survey. The teams collected data over a period of three weeks in the sampled sites, covering three project areas and one control area. It was the duty of the supervisors to verify the completeness of the questionnaires before forwarding them first to the Homa Bay Project office. The project office in Homa Bay provided a second line check on the surveys, where completeness and accuracy were verified. If excessive deficiencies were detected, the survey was sent back to the field supervisors, who were charged with revisiting the survey with the enumerator and then to send them back for callback as necessary to fill in particular discrepancies. After the two check points had been cleared, the surveys were passed on to the main program office in Kisumu for data processing.
Data processing
Data entry started a week after the commencement of data collection. To ensure that only complete questionnaires were entered, the program's monitoring and evaluation officer did a final verification of every questionnaire, after which the data was entered by two data entry clerks, using SPSS (Statistical Package for Social Sciences). The data entry clerks were both Master's level students with previous training in data entry, and had consulted for CARE-Kenya in that capacity on an earlier project.
Objectives.
The aim of the project being considered was as given in the Logical Framework Analysis (LFA), shown in table 2. Here we concentrate on the final goal of the first (TASK component.
The objective of the survey was to provide information for planning this project in terms of targeting and possible content and to give a basis for future evaluation.
A list of the variables included in the survey (as derived from the questionnaire) is shown in table 3. These variables are also defined in the dataset.

[1] Supported by CARE contract # 0010000670-1. The primary source for this material is in the Food Security module of the CD PANDA.
[2] Survey description section from Y Agyeman, Western Kenya HLS Baseline Survey, August 2000
Analysis of data from W Kenya. (Kenya HLS Baseline survey)
Some straightforward analyses are described here, and the dataset is included so that those interested can take this further. Still, it can be seen that even the fairly simple results here take us quite a long way. The results have been selected, so that a lot of negative findings have been excluded, and the dataset has been largely cleaned. Some key derived variables are included in the dataset provided, which has 531 variables and 1107 cases.
The first question asked, relevant to the project objectives of increasing household food security, was: whose household food security is low? The next question was: are there interventions that can improve production (and hence food security)?
Indicators of food security used for illustration here are value of agricultural production (1997 and 1998) and child nutritional status. A number of others are available, usually giving compatible answers but these show some of the strongest associations.
The first issue is: are there substantial differences by area division in this case? The results for total value of agricultural production per adult (TVALPD) were:

Greater differences were seen in 1998, when overall production was lower, and vulnerability higher. The analysis concentrated on 1998 to begin with. It can be seen that divisions 3,4, and 7 had substantially lower production than the others in 1998.
Does this correspond with child underweight, as a measure of child nutrition? The answer was generally yes:

Divisions 3 and 4 had relatively raised prevalences of underweight.
The indicators of food availability, such as whether the household reduced numbers of meals because of inadequate food, did not show associations either with production or child nutrition, so this was reserved for later investigation. Differences were not notable by administrative division in these indicators.
Land holding area was another factor likely to be important for food security. It was

These differences are significant (P<0.05)
So: if division 3 is more vulnerable, and small landholders are, what happens if we look at these two factors together?

We can see that the small landholders in divisions 2 and 3 have low value of production, and higher child underweight. These may priority for our project, and merit further investigation on the ground.
The next question is if we can find either reasons for the low production and nutrition, from the data; or, in this case, if interventions already begun seem to be improving matters, so that they perhaps should be pursued more widely.
A package of interventions had been in operation improved seed technology, extension advice for better plant spacing, timely operations, and the like. We can see if these are associated with better food security and nutrition. When the outcome indicators (underweight, and value of agricultural production) are computed for households with and without improved seed technology by division, we do find big differences. In Gwasi (division 2), underweight prevalence is 27% for those without improved seeds, 17% with; value of production is 1200 vs 3000 KSh/adult/year. In Lambwe, these are 31% vs 17%, and 1400 vs 2400.
The overall agricultural extension package includes 7 components, and different farming households adopt a different mix of these. A way to simplify this is create a variable that gives the number of components adopted, by adding these together. When this is done, a scatterplot like this is seen:

The slope is highly significant. When tabulated, the results look like this.

Similar results are obtained using yield rather than value of total production.
So, we have use of improved practices significantly related to production, and this in fact persists when we control for cultivated area. We can control for other possible confounders. Like education (of women, in this example), and even house construction (as an ses proxy). Production is still highly related to adoption of improved inputs. This remains the case in divisions 2 and 3 by themselves.

Finally, we can look at child nutrition as a function of these factors, and again we find a significant relationship:

In conclusion here, we could say that it seems likely that there are highly vulnerable groups among the small farming families in at least two divisions (Gwasi and Lambwe). However, interventions already applied among some farmers for improved farming practices appear to be associated with better food security and nutrition. Adoption by more farmers may be expected to improve their situation as well.
Clearly these are preliminary results. They would need checking with further analysis, and on the ground; they can only provide pointers. They are shown here as an illustration as to how to begin to answer questions relevant to program planning from such data.
Finally, the data set contains about 500 more variables that have not been touched yet there is a lot more to be found out! The user is encouraged to now bring the dataset into SPSS, and see what else can be found out. Almost certainly it will be new, never seen before