Re-categorize and Create Dummy Variables


Return to Multi-way

When you have unordered categories in a variable, like in the example for water source, then sometimes you need to group the existing responses into fewer groups to simplify the analysis.  For example, if there is not a clear reason to group certain water sources together but you need to make a group for good and bad water, then it might be helpful to run the mean outcome of interest against the variable for water source to see which categories the worse-off children are in.  Lets look at this example to make it clearer.  Run the mean WAZ score for the variable for water source:

1.  Open keast4j.sav

2.  Click on Statistics, Compare Means, Means.

3.  Enter the variable waz in the Dependent Variable list and hhwater into the Independent Variable list and click on OK.

c5p1ex2_1.jpg (31235 bytes)

When choosing to recategorize, it would be sensible to form groups based on similar categories such as lake, river, and rainwater (since those are all untreated, naturally available water sources).  Putting similar categories together is difficult because you have to make an assumption about which sources are inherently better, and that is not easy to tell.  By looking at the mean outcome score for these categories, it is a little easier to choose which categories might be grouped based on the nutrition status of the children who use that water source.  Here, you might put pipe water and well water (pump) together (-0.956 and -0.990 SD, respectively) as a category called dpipe (dummy for piped water);  public tap water and well without pump might go together (-1.378 and -1.203 SD, respectively) as a category called dwell (dummy for well/ tap); and lake, river and rain might go together as a dummy called driver (dummy for river/lake/rain).

Dummy variables are used in regression analysis to allow for identification of different distinct categories of a variable. It accomplishes a similar analysis within a regression as you would get using ANOVA or Analysis of Covariance, since each category of the variable is distinctly represented by a dummy (0,1) variable.  In order to avoid problems with collinearity in the model, follow the rule that you define n dummy variables (equal to the number of categories) but use n-1 of the dummies in the regression analysis.  For the water source example, here are the dummy variables you will create:

dpiped { 1= pipe water or well with pump; 0= any other source}            

dwell { 1= public tap or well without pump;  0= any other source}

driver { 1= river, lake, or rain water;  0= any other source}

Try this recode exercise to create the dummy for driver:
          1.  Open keast4j.sav

2.  Click on Transform, Recode, Into Different Variable...

3.  Choose the variable hhwater from the variable list and move it into the Input--> Output box. 

4.  Type the name driver in the Output Variable name in the box to the right and click Change.

5.  Click on the box labeled Old and New Values and then click on the dot for Value in the Old Value list and type the number 31 (for lake/pond water).

6.  Click on the dot for Value in the New Value list and type in the number 1, then click Add.

7.  Click on the Old Value, Value dot again and enter the number 32 (for river water).

8.  Click on the New Value, Value dot and assign the number 1 again, then click Add.

9.  Click on the Old Value, Value dot once again and enter the number 41 (for rain water).

10.  Click on the New Value, Value dot again and enter the number 1, then click Add.

11.  Now to code all of the rest as a 'no' response, click on Old Value, All other Values dot and then click on the New Value, Value dot and enter the number 0, then click Add.

12.  Now that all are coded, click on Continue, and then OK.

13.  It would be a good idea to go ahead and go through the labeling routine now that you have created the new variable named driver (1=river,lake, rain water ; 0=other water source)

Here is a frequency of your new variable:

wpe1.jpg (9788 bytes)

You would use a similar routine to recode into the other dummy variables for dwell and dtap, but these have been created for you already. 

Return to Top of Page