
Return to HH Size Ch.1 Exercise
Sometimes it is useful to look at data in an aggregated fashion, so that comparisons broader comparisons can be made. For example, for targeting purposes, it could be useful to map out the levels of malnutrition by district or province in a country. Aggregation is one way to summarize a situation easily, and might be useful if using a mapping program that assigns values by area (such as provincial stunting, wasting, or underweight).
Aggregation is simple, but care must be taken to properly select the variables that lend themselves to aggregating and also to recode and prepare the data that is not ready for aggregation. There will inevitably be some data lost in aggregation, for any time data is condensed it is loses information. It is always necessary to save the new aggregated data file as a new file, therefore it will be used for other types of analysis where it is more useful to look at more detail.
The module provides one EXERCISE in AGGREGATING, using a portion of a data set from Sri Lankan Community Nutrition Project Baseline Survey called the PNIP. The data set has only a few basic variables to introduce the idea of aggregation and show the details of creating an aggregate file. Sri Lanka is broken into divisions at the provincial level, the district level, the district secretariat level, and the community level. The data could be collapsed at any of these break points, but this lesson shows collapse at the DISTRICT level (district will be the ‘break’ variable).
Follow these steps to aggregate:
1. Open SPSS
2. Open the data file named SAsia.sav (create a CODE BOOK before to see the variable definitions)
3. Use a Code Book to detect any errors in the data. The goal is to have most of the variables labeled in a bivariate format so that 1= positive response and 0= negative response. The 0,1 will allow for a Mean score to be processed in the aggregation (other option are available, e.g. number of cases or % below a certain number, etc). When ‘MEAN’ is used for the choice in aggregation, each case is counted for the denominator, but only the positive responses (1) are counted for the numerator. So if 5 of 50 cases are positive for stunting, the calculation will be 5/50= 0.1
4. If this mean outcome is multiplied by 100 it would give the percentage positive for (‘affected’ by) the variable of interest.
5. Recode any variables that need recategorizing or cleaning. Most variables will be 0,1, but some of the continuous outcome can be left as is so that a mean value will be calculated, for instance the WAZ, HAZ, WHZ scores. Just double check that all variables are properly coded so that either a count or a mean score can be calculated for the district level.
6. Once the data is prepared (this is crucial), click on Data, Aggregate.
7. Move the variable district into the Break variable box and move the ALL of the remaining variables into the Aggregate variable box.
8. Each one will automatically be given the default option for calculation (which is the Mean value), but if a different calculation is desired, just click on the variable of interest and click on Function to choose a different option. Change the function option on fno (family number in the household) and childnum (child’s number for which the interview is conducted) to Number of cases instead of mean.
9. Click on the dot labeled Create a new data file and type in the name c:/file location/SADistrict.sav
10. Click on OK.
Label the variables for the new data file SADistrict.sav. Compare this file with the sample file made called SADist.sav.
This is all there is to AGGREGATION. The rules to remember are: