| FS Home |
|
| Section 1: Introduction | |
| Section 2: Coping Strategies | |
| Section 3: Computing | |
| Section 4: Analysis Ex. (HLS Bangladesh) | |
| Section 5: Analysis Ex. (HLS Kenya) |
One-Way Analysis
Targeting
Two-Way Analysis
Regression
Topics on this page:
Uses of Regression
Basic Equation
Interaction
Graphing
Creating dichotomous variables
Building a Regression Model to control for
Confounders
Regression is used:
with a continuous outcome (or dependent) variable
with continuous and categorical (usually 0 and 1) determining (or independent) variables, in order to:
look for confounding
look for interactions
confirm findings from tabulation, and vice versa
This section introduces simple regression using two independent variables. It will provide the fundamentals of regression analysis. Regression is especially powerful for exploring more than two independent variables, and it will be most useful for multi-way analysis in the following module.
The Basics of a Regression Equation
The following graph shows a simple (one independent variable) regression equation and where each of the variables is represented.

Developing a Regression Equation
To start, it is a good idea to have a look at the associations of the independent variables and the outcome, using scatterplotting, that relates to your research question. Here is the question we will use for analysis in this section:
What is the relationship at the household level of irrigation and
cultivation on
food sufficiency in Somalia?
Here is an example of a scatterplot:

There is a positive association with the level of food sufficiency and the area of land
under cultivation for the household.
This is shown by the slope of the fitted line (we will check in a moment to see if this is
significant). This positive slope is reasonable, considering that generally the more land
under cultivation the greater the amount of food produced.
Here, the fitted line shows the association between the outcome, food sufficiency, and
the independent variable, area under cultivation (it is calculated by minimizing the
square of the value of the sum of all the distances between the observation and the line
-- hence, the least squares line). The line fits all of the points marked
on the graph so that the deviation squared from the line for all points combined is the
smallest possible for that set of points. The deviations from the line are used to predict
other features of the relationship between outcome and independent variables. One of these
is the coefficient of determination (R-squared), which tells the amount of variability in
the outcome that is due to the independent variable (percent of change in food sufficiency
due to the area cultivated). An r-square of
0.0031 is telling us that very little of the variability in food sufficiency is explained
by cultivation.
What is the associated regression equation using cultivation?
(remember: the dependent variable on the left-hand side and the independent on the
right-hand side):
Level of Food Sufficiency =
A + B (log area cultivated)
or with real numbers:
Level of Food Sufficiency = 3.114 + 0.424 (log area cultivated)
Significance for cultivation (p-value)
= 0.300, thus non-significant.The same analysis can be done using irrigation:

The association seen here is also positive, but much stronger than the association found using area cultivated. The r-square is much larger (0.0855), but to test the significance of the association we must again use regression.
Not only is the r-square larger and the equation statistically significant (p-value
< .05), the coefficient for area irrigated is
more than two times larger than the coefficient for cultivation (0.424) at 1.029, meaning
that for every one unit rise in area irrigated (hectares) a corresponding 1.029 month
increase in food sufficiency is estimated.
These results are interesting because the regression coefficient for irrigation is now an insignificant predictor of food sufficiency when controlling for cultivation. The coefficient for irrigation was also greatly reduced from 1.029 to 0.678 when cultivation was added to the model. Before we can come to any conclusions from our findings, we first need to test for interaction between cultivation and irrigation.

This interaction could be catagorized as LESS
THAN ADDITIVE: When the results are less than additive, neither factor shows an effect
when improved alone because both are needed to see an improvement.