Set Dichotomous Variables

Return to Previous Page

When making comparisons with independent variables such as water supply or number of months breastfeeding and an outcome variable such as nutritional status, it is sometimes useful to create categories that represent groups or levels of the potential causal factors. The most challenging part of creating categories, at least in some instances, is determining where to break up the variable to create categories.

The first step in creating categories from a continuous variable is to eye ball the distribution of the current outcome and visually break the outcome into equal categories. Usually, breaking into equal categories is the goal, for example good versus bad, where half of the respondents fall in the ‘good’ and half in the ‘bad’ half. Breaking into two categories creates a dichotomous (or two forms, i.e., high/ low, good/ bad, etc). It is also possible to break into any number of categories, but usually it does not benefit to break into more than three groups. One might create three categories for levels of a category, such as high, middle and low income groups.  In this Bangladesh dataset, the prevalences of different types of water sources are given. Each water source is relatively safe for use in the household except the other category. Therefore, the ‘other water sources’ prevalence will be used to categorize good versus bad water use in the household. A histogram will be used first to identify where the arbitrary cut point for the half way mark. Here are the steps to making the categories using water sources for the household.Try this exercise below:

1.  Open bdeshc.sav

2.  Click on the menu Graphs, Histogram

3.  Scroll down the left hand variable list in the histogram box to select the variable wathoth (uses other types of water for the household) then enter it into the Variable box using the arrow button

4.  Check the box 'Display Normal Curve.'

5.  Click on OK

Does the histogram look like this?

wpe11.jpg (21793 bytes)


The histogram shows a superimposed normal curve, although the actual distribution is far from a normal distribution. This is partially due to a small number in the sample (n=64). Despite this, the middle or half point is 57 and for the sake of simplicity since setting cut points is quite arbitrary, 50% is chosen for the division of good versus bad prevalence of unsafe water use. Now that the categories have been chosen, the old variable wathoth must be recoded to a new variable called watcat (short for water category; 1= >50% and 0 = <50% use other water sources).


To Recode into a New Variable called watcat:

1.  Click on Transform, Recode, Into a Different Variable (to make wathoth --> watcat)

2.  Select the variable wathoth from the variable list and place it in the Input variable -->Output Variable box using the arrow button

3.  Type watcat in the box labeled Output variable Name: and click on the button marked Change (this should make the Input to output box read wathoth -->watcat)

4.  Click on Old values and New values button

5.  In the Old value area click on the dot marked Range and type 0 in the first box and 49.9 in the second box

6.  In the New value area click on the dot marked Value and type in 0 and click on Add

7.  Now click on Range again and enter 50 in the first box and 100 in the second box

8.  Click on the Value dot and type in 1 and click on Add

9.  Click on Continue and OK


Now the new variable is created, at the end of the dataset for watcat. This should assume values of 1 and 0 only, where one is bad and 0 is good water source.

To verify the outcome is correct, run a frequency of the new variable using Statistics, Summarize, Frequencies and enter wathcat as the variable of interest. Does this look like the outcome?


wpe13.jpg (7688 bytes)



For one-way analysis, look at the prevalence of malnutrition with 'good' water in comparison to the 'bad' water category. If there is an obvious difference, then it is possible this chosen cut point was successful in dividing the groups to distinguish differences. Different points can be chosen and also more categories, to test how well the results hold under further aggregation.

At times, there will be obvious cut-points to use based on goals set by an organization, a country or an international agreement such as the Rights of the Child. If this goal has been set, then it is the best place to start for a cut-point.

Return to Top of Page