Page 4

Transformations

Transformation of data means changing variables as a pre-requisite to analysis. For instance, you might want to know the prevalence of malnutrition, from data on height, weight and age.   To make this assessment, you must perform a transformation to create z-scores from your variables (height and age for stunting and weight and age for underweight) and then assign each case as above or below -2 standard deviations of the mean, or other cut-off . This can then lead to calculating proportions or prevalences. Examples of transformations described in this section are:

A.   Date of birth and date of interview arowtail.wmf (1262 bytes) Age in months
B.   Weight, height, age in months, sex arowtail.wmf (1262 bytes) Z-scores (HAZ, WAZ, WHZ)
C. Z-scores (HAZ, WAZ, WHZ) arowtail.wmf (1262 bytes) Stunting, Wasting, & Underweight Categories
D. Household size arowtail.wmf (1262 bytes) Dependency Ratio
E. HH size, number of women arowtail.wmf (1262 bytes) % Women in HH
F. Water source listed by type (e.g. tap, well, etc.) arowtail.wmf (1262 bytes) Dummy variables for source of water (e.g. for regression analysis)
G. Age for breastfeeding and
introduction of various food types
arowtail.wmf (1262 bytes) Feeding categories

 

The Basic Steps of Data Transformation:

  1. Determine what variables you need that are not in the current data set..
  1. Select the existing variables that can be used to create the new variables.
  1. Write the formula, or determine the categorization rules -- for example:
A - B = New Variable basic formula  (addition, subtraction, multiplication, logarithm, square, square root, etc.)
A=-5  to  A<-3 is category 1 categories
A=-3  to  A<-2 is category 2
A=-2  to  A<-1 is category 3
A<-1  to  A=5 is category 4
A / B = New Variable ratio formula
(A / (A+B+C) )*100 percentage formula
D1 = 0 ,1 ; D2 = 0, 1; D3 = 0, 1; D4 = 0, 1 dummy variables - categories
Dtotal = D1+ D2+ D3+ D4 value from 0 to 4

Use the Transform, Compute or Transform, Recode options from the Data Editor menu in SPSS to apply the formula to create a new variable, or to assign new categories.

  1. Exercises for these transformations are shown in detail below:

 

GOOD PRACTICE TIP:

It is good practice to copy the variable being transformed as a new variable with a different name, instead of recoding to the same variable. Until one is really well practiced in recoding (and even then), a mistake in recoding to the same variable is permanent and the data is permanently lost if a further mistake is then made -- like resaving over the existing dataset -- unless you have yet another copy of the dataset stored elsewhere (which is never a bad idea!).


TRANSFORMATION A:

Date of Birth and  Date of Interview

arowtail.wmf (1262 bytes)

Age in Months

 

The date of birth and date of interview could come in several forms. You could have 1.  separate variables for day, month and year, or you could also have 2. any of these single variable formats:  dd/mm/yy, mm/dd/yy, yy/mm/dd

  1. When you have separate variables for month, day and year,  the easiest way to transform is to create a new variable called age using SPSS  function Transform, Compute  in the Data Editor menu. Here is how to do this on a practice file from a South Asian village (SAsiavill.sav):

EXERCISE 1:

1.1  Open SAsiavill.sav  

1.2 On the data editor menu select Transform and Compute

1.3  In the Target Variable window type age

1.4  Select the variable in the list labeled year of measure (q608) and press the arrow to place it in the formula box. Also, select and place in the box these variables: month of measure (mom), day of measure (dom), year of birth (q602), month of birth (mob01), day of birth (dob01)

            Now alter the formula box to make it look like this:

(Q608*12 +mom+dom/30.42) - (Q602 * 12 + mob01+dob01/30.42)

What this formula does is convert all of the variables for year month and day in to total months for each measurement and birth dates. Then the formula subtracts the birth months from the measurement months to give the time in months remaining, or the child's age at the time of measurement.

Q608 = year of measurement (times 12 = months)

mom = month of measurement

dom = day of measurement (divided by avg. days in a = months)

q602 = year of birth (times 12 = months)

mob01 = month of birth

dob01 = day of birth (divided by avg. number of days = months)

1.5  Press OK to run the command. A new variable called AGE will be placed at the end of the variable list. You should find that the first child has an age in months of 28.97 and the second child 12.97. Is this what you found?

1.6 Save the data now that you have created a new variable if you would like to keep this as a part of your data base.

1.7  Press File and Save.

Note: an alternative procedure, between steps 1.2 and 1.3, is to create the variable (age) first, by inserting it and setting its characteristics including name, then going to step 1.3.

 

  1. When you have one variable that includes the month, day and year, the SPSS program uses this format to calculate the dates. Use the following instructions to calculate age from the dd/mm/yy, mm/dd/yy, or yy/mm/dd format:

EXERCISE 2:

2.1  Open SAsiavill.sav

2.2  Under the Data Editor choose the Transform menu and Compute.

2.3  In the target variable window, type age2.

2.4  Click the Type and label button and give the label Age in Months and press Continue.

2.5  Under the Functions options choose CTIME.DAYS and press the arrow to insert the option in the Numeric Expression window.

2.6  In the (  ) after the CTIME.DAYS insert the Variables from the right hand variable list. Choose the variable intervie and birth and click the right arrow to place them in the Numeric Expression window. Place a minus sign in between the variables so that the final expression looks like the following:

CTIME.DAYS (intervie - birth) / 30.42

This formula calculates the time in months as well, using the automated formula in the computer for calculating time in days (ctime.days) and dividing by 30.42 (the average number of days in one month of the year) to give the months since the child's birth.

 

2.7 Click the OK button. The new variable age2 will be entered as the new last variable in the data editor next to AGE.

2.8  Check to see if age and age2 are the same -- how? scatterplotting is a convenient way ...

 


TRANSFORMATION B:

Weight , Height, Age in months, Sex

arowtail.wmf (1262 bytes)

Z-Scores (HAZ, WAZ, WHZ)

 

In this exercise we will show how to use variables from:

an existing SPSS Windows (.sav) file
an Epi-Info .rec file

...in order to calculate z-score values using Epi-Nut (which is part of the CDC's EPI-INFO).

Then we will demonstrate how to merge the new z-score information back in the existing file for analysis.

 

EXERCISE 3: When transferring FROM an existing SPSS Windows (.sav) file into EPI-INFO:

3.1  Open the cam1.sav file which contains some example data. Identify the variables for unique ID, child age, child sex, child weight (kgs), and child height (cms).

3.2  Highlight the variables you do NOT want and delete them. When you have only the variables for casenum, age, sex, weight, and height remaining, then SAVE AS a new file name:

3.3  Save the new file  as cam2.dbfIII file in the Epi6 subdirectory of your computer (c:\epi6\cam2.dbf) .

3.4  Go to Import in Epi-Info under Programs, and select .dbf file as import file type, type cam2.dbf and then process. It will produce the cam2.rec file for you.

3.5  Create anthropometric variables using the Epi-Nut program by the following procedure.

After you have imported the cam2.rec file, you will go to the Epi-Nut processor, which is
under Programs in the Epi-Info system.
Go to Indices and Add to file. There, type in the name of the .rec file you will use.
Use the mouse to click on the arrows to select the variable names in your data set for Age,
Sex, Weight, and Height.
Next use the mouse to select only whz, waz, and haz. All nine indices are selected (X)
by the default but usually you only use the Z-scores. The space bar can also be used
to remove the default selections, moving the cursor with the arrow buttons.
Then click on Process. The computer will give you a status report when finished.
Go back to Analysis and 'read' the updated cam2.rec file you created. View the file
by pressing the F4 key.  Then press F10 to go back to the main menu of your computer.

3.6  Go to Export in the Program menu. Export the cam2.rec file with anthropometric variables in it as a dbase III file. You can export it to the SPSS subdirectory. Open the exported file in SPSS Windows and save it as cam2.sav. You can overwrite the previous file since this one has the newest information in it.

3.7  To merge the files in SPSS-Windows open the larger file. Under the Data menu, select Merge and then select Add new variables.

3.8  It will then go to the open file selection and you choose the file cam1.sav. Now you select Match cases on key variables in sorted files and select casenum. Move it to the Key Variables box using the small arrow. Then select out all variables except the waz, whz, and haz variables and place them in the Excluded Variables box and process.

3.9  You should now have the anthropometric variables added to the cam1.sav data set. Make sure to save the new dataset.

 

 

EXERCISE 4: When calculating z-scores FROM Epi-Info .rec file directly

4.1 To open Epi-Info: From windows, click on the MS-DOS prompt. Then change to the Epi6 subdirectory. When you see c:\Epi6> then type Epi6 again and press return.

4.2 Open the kenya1.rec file in Analysis by typing read kenya1 and pressing return. Look for the variable names for child age (in months), sex, weight (in kgs) and height (in cms). Also look for the unique identifier like ID number or Case ID. Each child record should have a number which identifies that particular child so when you merge the files together again there is no problem matching the new anthropometric variables to the correct child.

4.3  Next you will write a program that will create a small .rec file that contains only the unique identifier and child sex, age, weight, and height variables. For some reason the Epi-Nut processor cannot easily handle large data sets. This way we only have the information necessary for calculating these anthropometric indices. These programs are written in the EPED word processor.

TAKE A LOOK at the program you will write in EPED and then try writing and saving it.

 

4.4 Creating the kensm1.rec by running the kenya1.pgm:   (STEPS TO FOLLOW)

After saving the program file called kenya1.pgm and exiting EPED using F10, click on
PROGRAM and ANALYSIS.
At the prompt, type run kenya1.pgm (if the program is not in the epi6 directory,
then specify its location) and press enter.
When it has finished running, press the F10 key again to get back to the C:\Epi6>
prompt.
Type at the prompt:  read kensm1.rec, which will open the new data set you created.
Press F4 to see the file structure, the click F10 to exit.

4.5 Creating anthropometric variables using the Epi-Nut program: (STEPS TO FOLLOW)

After you have created the kensml .rec file, you will go to the Epi-Nut processor, which is under
PROGRAMS in the Epi-Info system.
Go to INDICES and Add to file. There, type in the name of the .rec file you will use. In this instance
it will be the kensm1.rec file created from the program.
Use the mouse to click on the arrows to select the variable names in your data set for Age, Sex,
Weight, and Height.
Next use the mouse to select only whz, waz, and haz. All nine indices are selected (X) by the
default but usually you only use the Z-scores. The space bar can also be used to remove the
default selections, moving the cursor with the arrow buttons.
Then click on Process. The computer will give you a status report when finished.
Go back to Analysis and 'read' the kensm1.rec. View the updated file by pressing the F4 key.

4.6  Merging two files - anthropometric variables with the original file. (PROGRAM TO FOLLOW in TAKE A LOOK)

You will run a program to merge kensm1.rec with the kenya1.rec. In doing this remember that you are matching the data for each record with the unique identifier (caseid) and are actually creating a third .rec file with all variables.

Go to PROGRAMS and select MERGE.
In the file 1 box enter kensm1.rec, in file 2 box enter kenya1.rec, and in the output
box enter kenya2.rec.
For merge options, click on the dot marked JOIN and press OK.
To match fields, choose all fields listed and click on OK.

4.7  Viewing complete data set with anthropometric variables:  (STEPS TO FOLLOW)

Go to Analysis and type read kenya2
Press F3 to see a variable list and look for the waz, whz, and haz variables.

4.8  Exporting the new kenya2.rec for analysis in SPSS Windows: (STEPS TO FOLLOW)

Under Programs, go to Export.
Type kenya2 and select dbaseIII as the file type to export.
It will automatically name it kenya2.dbf and will export it to the C:\epi6 sub-directory.
Open SPSS Windows and open the Kenya2.dbf file - it will be found in the Epi6 sub-
directory.
After the data is opened in the Data Editor (you must look at all files, not just .sav files to
find it), go to File and Save As and save it in the SPSS sub-directory as a .sav file.

TRANSFORMATION C:

Z-Scores (HAZ, WAZ, WHZ)

arowtail.wmf (1262 bytes) Stunting, Wasting, & Underweight Categories

 

Z-scores (which is short-hand for 'standard deviation scores') are computed by comparing a child's measure to a reference population to place it on a scale in respect to the reference population.  In effect this is computed with the reference population set to have a mean of 0 and a standard deviation of 1. The scores are most commonly produced for height for age(HAZ), weight for age (WAZ), and weight for height (WHZ), and deficits in these values are known as 'stunting', 'underweight', and 'wasting'. 'Malnutrition' is the overall general term, used somewhat loosely to describe any of these deficits -- as well as other forms as in 'micronutrient malnutrition'.   Cut-points have been chosen that are widely agreed upon to categorize the degree of malnutrition:

<-3<-3 SD severe
-3 and <-2 SD moderate
-2 and <-1 SD mild
-1 SD normal

Usually the total of severe and moderate are used to indicate significant levels of malnutrition, and those mild or better are categorized as not malnourished. 'Prevalence' usually refers to a cut of -2 SDs.  The standard designation and coding is:

-2 SD not malnourished coded as 0
< -2 SD malnourished coded as 1

With this coding of 0 and 1, for not malnourished and malnourished, the mean of this variable for a population group is actually the proportion malnourished, because the total of those with a value of 1 (malnourished) is divided by the total number of children measured (the 1 and 0 groups or the n of the population). This makes calculating the prevalence of malnourished children very simple -- as the proportion * 100 to give the prevalence as a percent. 

Note: quite often this multiplication  by 100 is not done -- as in many DHS datasets -- and the analysis is quite correct and equivalent using the proportion rather than the %.   You may prefer to go through, when the proportion is given, and multiply by 100, to give the more familiar prevalences directly.

 

The following directions tell how to create categories for malnourished and not malnourished using z-score cut-offs (different categorization schemes can be used, either creating multiple categories for severe to normal or just two categories for malnourished or not).   Try categorization using a data set from a South Asian village (SAsiavill.sav), follow these steps:

EXERCISE 5:

5.1 Open SAsiavill.sav

5.2  In the data editor, select Transform, Recode, Into Different Variables

5.3  From the left hand variable list, select haz, waz and whz one by one and put them into the Input variable -> output variable box with the arrow button.

5.4  Highlight haz by clicking on it and then move the cursor to the output variable Name box and type Stunt2, then hit the Change button and the new name should read haz -> stunt2. Do this for waz with the new name Underwt2, and for whz with then new name waste2.

5.5  Now once each one has a new name, select the Old and New Values button.

5.6  Select Range in the old value side and type -5.0 in the first box and -3.1 in the second. On the new value side, type 3   in the value box and then Add.

5.7  Now enter the following ranges in the same fashion:

first box second box new value
-5.0 -3.1 3
-3.0 -2.1 2
-2.0 -1.1 1
-1.0 4.1 0

5.8  Finally, click on the All other values box on the left side and the System Missing box on the right and then Add.

5.9  Now, select Continue and then OK.

5.10  Three new variables, Stunt2, Underwt2, and Waste2 will appear at the end of the data set with values 0 to 3.

5.11  Double click on each column to open the Define Value box, select labels and assign value
labels as follows:

0 = normal
1 = mild
2 = moderate
3 = severe

 

Using the Eastern Kenya data set, first run descriptives for HAZ, WHZ, and WAZ to verify that they have been properly cleaned.

        1.  Open keast4j.sav

        2.  Under the Statistics, Summarize,Descriptives.

        3.  Select haz (haz std. deviations), waz (waz std. deviations), and whz (whz std. deviations).

        4.  Under Options, you can select other statistics to be generated (e.g. Mean, Minimum, Maximum, Std. Deviation)
            and click on Continue.

        5.  Click OK.

 

wpe12.jpg (13580 bytes)

INTERPRETATION:

From the output, it is clear that the data has been cleaned and therefore is ready for categorization into new outcome variables.  The minimum and maximum are within +5 and -5 standard deviations of the mean for WAZ, WHZ, and HAZ and the mean and median values are not drastically different to be wary of severe polarity of the data to one extreme or the other.


If you need to produce CATEGORICAL Anthropometric Data, the following exercises would be used used.  These variables already exist in the Eastern Kenya Data set, but to produce categorical anthropometric variables for other data sets, follow these steps:  

        1.  Open the data set of interest

        2.  Under the Transform menu, select Recode - Into Different Variables.

        3.  Select haz - output is called stunt (click change)

        4Select waz - output is called under (click change)

        5.  Select whz - output is called waste (click change)

        6.  Click on Old and New Values

        7.  Under Old values click on Range, Lowest through [blank] and type -2.01. Its New Value is '1'. Click on Add.

        8.  Under Old values, click on Range, [blank] through highest, and type -2.00. Its New Value is '0'. Click on Add.

        9.  Click on Continue.

        10.  Click on OK.

 

Now you have new variables called stunt, under, and waste. You will have to make a few modifications to them before running the frequencies. They will be the last three columns in the data set.

 

Make these modifications to the new variable stunt:

        1.  Go to stunt and double click on the variable name (in the gray area - stunt).

        2.  Click on Type and change Width to 4 and 0 decimal places.

        3.  Click on Labels. Type Stunted in the variable label.

        4.  For value, type '1' and value label is '< -2.00 haz' - click on Add.

        5.  Next, type '0' in value and call it '>= -2.00 haz' and click Add.

        6.  Click on Column Format and change the width to '5'.

      7.  Click OK.

Go through the same steps to create the same categories for under (waz) and waste (whz).

 

Now produce frequencies for each of the CATEGORICAL anthropometric data:

1.  Open keast4j.sav

2.  Click on Statistics, Summarize, Frequencies.

3.  In the Frequencies box, select the three variables for stunting- haprev, wasting- whprev, and underweight- waprev.   Place each variable into the variable box one by one using the arrow key.

4.  Be sure the check mark  remains in the Display frequency tables box.

5.  Press OK.

 

For Stunting:

wpe5.jpg (10485 bytes)

For Wasting: 

wpe6.jpg (9452 bytes)

For Underweight:

wpe7.jpg (11965 bytes)

INTERPRETATION:

In the Eastern Region of Kenya, about 34% of the children under 5 are stunted (<-2 SD height-for-age), approximately 7% of the children are wasted (<-2 SD weight-for-height), and almost 30% of the children are underweight (<-2 SD weight-for-age). All of these are above the expected frequencies of low z-scores in a normal population.


TRANSFORMATION D:

Household Size arowtail.wmf (1262 bytes) Dependency Ratio

This is just one example of a ratio that can be calculated. Ratios provide a composite number that represents some aspect within a family or region. For example the dependency ratio gives the ratio of dependents to the non-dependents in the household. Ratios are simply divisions of one number over another, as in number of children to a mother (children / mother) or the number of clinics to the number of individuals in a community (number of clinics/ population of the area). These can give a sense of the situation in the area of observation and can be used as a simplified description as well as comparing with other areas or groups. To practice calculating the dependency ratio for dependents to the household size, follow these steps using the Eastern Kenya data set (keast4j.sav):

EXERCISE 6:

6.1  Open keast4j.sav in SPSS by clicking on File, Open, keast4j.sav

6.2  Click on Transform, Compute and a box will appear labeled Compute variable.

6.3  In the target variable box, type depratio (for dependency ratio- this will be the name of the new variable).

6.4  In the left hand box listing all of the variables, scroll down and highlight numkids (number of children in the household) and click the arrow button to place numkids into the Numeric Expression box.

6.5  In the left hand box is also a variable hhsize (household size), highlight this and place it into the Numeric Expression box using the arrow key.

6.6  Alter the formula to look like the following: numkids / hhsize

6.7  Now click on OK

6.8  A new variable labeled depratio will be created at the end of the data set that will take the value 0.0 to 1.0 (0.0 being no children and 1.0 being all children in the household). This gives an example of the burden of children to adults in a household.

6.9  To calculate the range and mean score of dependency ratios for Eastern Kenya, click on Statistics, Summarize, Descriptives, and select depratio and press OK.

 

Does the output look like this?

DESCRIPTIVE STATISTICS

  N Range Minimum Maximum Mean Standard
Deviation
DEPRATIO
Valid N (listwise)
876
876
.75
 
.00
 
.75
 
.3144
 
.1436
 

TRANSFORMATION E:

HH Size and Number of Women arowtail.wmf (1262 bytes)

% Women

 

Percentages can both be useful at the individual and the aggregated level of analysis. For the individual, one might want to calculate the percent of income from jobs or the percent of time spent cooking or the percent of the household that is women. When a variable takes a value of yes and no, such as exclusively breastfed up to 4 months, has access to health clinic, or uses iodized salt then the aggregated data (at the district, province or country level) will conveniently show the prevalence of that practice. For example, those who use iodized salt are marked with a one and those who do not are 0. The aggregated number will be the count of those with a one only (0 does not change the count) divided by the total number that have a response of 0 or 1 (total population measured) which is a percent of total measured, or a prevalence. The lesson on creating prevalence through aggregation is shown in the Aggregation Submodule

Click here to link to the Aggregation Submodule

Here is a lesson on creating percentages with individual level data. The percent can be run to create a new variable for percent of the household that is female. This is simply done by using the Transform and Compute option in SPSS.

EXERCISE 7:

7.1  Open keast4j.sav

7.2  Select Transform and Compute

7.3  Enter the variables numwomen and hhsize one by one from the left hand selection box into the Numeric Expression box with the right arrow.

7.4  Alter the formula to make it model the one here:

(Numwomen / hhsize)*100

7.5  Enter the new variable name in the Target Variable box, ie. Womenpct

7.6  Press OK and find the new variable percent of household that are women at the end of the database

 

You can also run percentages as output with individual data. If you wanted to show the percent read the newspaper, watch TV, and listen to the radio at least once a week, here is and example of how using the keast4j.sav data:

EXERCISE 8:

8.1  Select Statistics and Summarize, Frequencies.

8.2  In the left hand variable list, scroll down to select newpap, radiowk, and tvweek and place then in the right hand Variable box one by one, using the right arrow.

8.3  Press OK and an Output screen will appear showing the percent that read the newspaper once a week, the percent that watch TV once a week, and the percent that listen to the radio once a week. Does your output look like this?

Reads Newspaper Once a Week

 
Valid
 
Frequency
 
Percent
Valid
Percent
Cumulative
Percent
No
Yes
Total
692
184
876
79.0
21.0
100.0
79.0
21.0
100.0
79.0
100.0
 

Listens to Radio Every Week

 
Valid
 
Frequency
 
Percent
Valid
Percent
Cumulative
Percent
No
Yes
Total
343
533
876
39.2
60.8
100.0
39.2
60.8
100.0
39.2
100.0
 

Watches TV Every Week

 
Valid
 
Frequency
 
Percent
Valid
Percent
Cumulative
Percent
No
Yes
Total
851
25
876
97.1
2.9
100.0
97.1
2.9
100.0
97.1
100.0
 

TRANSFORMATION F:

Water source listed by type (e.g. tap, well, etc.)

arowtail.wmf (1262 bytes)

    Dummy variables
for each source of water
(e.g. for regression analysis)

Dummy variables are used to introduce a characteristic into an equation coded as 0 and 1.  An example of creating a dummy variable would be if the sources of water used in the household were made into the three separate variables (each coded as 1 and 0) that represent the three possible responses given on a questionnaire: piped, well, or river water.

Note:  A very common convention is to start the variable name of many dummy variables with
d... -- hence for water source you'll see dpiped, dwell, and driver in many of the datasets used here.

A dummy variable itself changes the intercept in a regression model; interacting with with another variable changes the slope (or coefficient) of that variable. 

0 - 1 (dichotomous) variables are familiar:   gender, Male and Female, coded as 1 and 0
or as 1 and 2, is effectively a dummy (although for some reason not usually named to start
with a d... -- sex or gender are more usual).
Another common example is urban location:   durban =1 would mean the household is
urban and the default of durban=0 would mean the household is not urban, or it is rural.
With multiple categories that cannot be properly scaled (which  is usually the case), create
a dummy for each category.  (e.g. if water supply is coded piped, well, river, create the
dummies: dpiped, dwell, and driver).
Always exclude one category when using dummies in a regression.  Choose a largish
category preferably at one end of the range of effects, for ease of intuitive interpretation.  
[Most programs will warn you that you that the regression has met a singularity
(black hole) if you forget to exclude one category!]

 

The categories of a dummy variable are mutually exclusive, one person can only fall into one category (0 or 1) -- they either have piped water in the house or they don't; they are either male or female, and so on.  On the other hand, it is possible to fall into the dummy = 1 category of two related dummy variables -- one could have a tap and a well in the household, so both dpiped and dwell could = 1.  The important thing is to be aware of which related categories are mutually exclusive.  Often mutually exclusive variable frequencies for each category add up to the total sample.  In this case the respondent should only answer 1 to one of these:  dpiped (dummy for piped or well pump water), dwell (dummy for public tap or river water) or driver (dummy for river water).  Here is the frequency of household water categories (hhwatcat), which you will dummy code:

wpe1.jpg (12140 bytes)

 

Now try to create a dummy code from the variable household water categories in the Kenya East dataset (keast4j.sav) using the exercise below:

EXERCISE 9:

9.1 Open keast4j.sav

9.2  Scroll across the variables to find hhwatcat and move over and highlight the column to the right of that variable by clicking on the top grey cell that contains the variable name.  If the column is highlighted, then the entire row will turn black.

9.3  Once the column is highlighted, click on Data, Insert Variable and a new variable named V001 will appear in the column next to hhwatcat.

9.4  Click on the new variable column and then select Data, Define Variable to open the data defining box. 

9.5  Insert the variable name dpipe (for piped water source) and click on Labels and enter piped or well pump dummy, then click on Continue and OK.

9.6  Click on Transform, Compute in the Data Editor menu.  Then enter dpipe in the Target variable box.

9.7  Click on the button marked If... in the lower middle section, and a box will open called Compute variable:If cases...  

9.8  In this section, click on the dot marked Include if case satisfies condition (the middle box will become active[white] instead of inactive [grey] as it was before)  In the white box, enter the variable hhwatcat from the right hand variable list and then insert an = sign and then the number 1, so that the final statement in the box is hhwatcat=1, and then click Continue

9.9  Back in the Compute variable screen, put the number 1 in the Numeric Expression box and click on OK.   It will ask you if you want to change the existing variable, you say YES

9.10  Now that you have coded a 1 for all that do use piped water, you must code all those that do not use piped water as 0 for NO.  To do this you will go through a similar routine.  Click on Transform, Compute and enter dpipe in the Target Variable box (AGAIN, because we will add the second category to dpipe).

9.11  Click on the If button and then click the dot Include if case satisfies condition..and enter hhwatcat ~= 1 (hh water category is not equal to 1) in the middle box and click on Continue.

9.12  Enter 0 in the Numeric Expression box and click on OK.  It will ask you if you want to change the existing variable, you say YES.   This will completely code all of the cases for dpipe as 1= yes piped water or 0 = not piped water.   You can change the labels in the Data, Define Variable routine under Labels (where you enter 1 and yes and 0 and no in the appropriate spots).

9.13 To complete the dummy coding for a new variable dwell, run steps 9.2 through 9.12 again for replacing dwell for the new variable name throughout all steps, and 2 in step number 9.8 as the positive remark for well water, and ~=2 in step 9.11 as the response for not drinking well water.

9.14  To complete the dummy coding for a new variable driver, run steps 9.2 through 9.12 again for replacing driver for the new variable name throughout all steps, and 3 in step number 9.8 as the positive remark for well water, and ~=3 in step 9.11 as the response for not drinking well water.

9.15  Run a frequency for each to see if the total of the 1's for all 3 categories add up to the Total of the cases (all cases should be coded as 1 in one of the three dummies, therefore the total of the 1's for all three should be equal to the total number of cases). Click on Statistics, Summarize, Frequencies and enter dpipe, dwell, and driver in the Variable box and click on OK.

 

wpe2.jpg (10233 bytes)

wpe2.jpg (10268 bytes)

wpe3.jpg (9779 bytes)

So, the total of all 1 group of the dummy variables for hhwater equals to 875 (104 for dpipe, 243 for dwell, and 528 for driver).  This is what you would like to accomplish when you create a dummy variable.  Also remember, if you were to use this to run a regression analysis, only enter two of the three categories in the equation, leaving out one of the larger categories (such as driver) that is at the end of the range of effects (since river water is considered a to have a more negative health consequence than piped water. 


TRANSFORMATION G

Breastfeeding and Introduction of Complementary Foods

arowtail.wmf (1262 bytes)

Feeding Categories

This is covered in the Child Feeding Submodule. CLICK HERE to Link to the Chapter 6 Page on Child Feeding.

Return to top