FS Home

titlereg.jpg (8324 bytes)

Section 1:  Introduction
Section 2:  Coping Strategies
Section 3:  Computing
Section 4:  Analysis Ex. (HLS Bangladesh)
Section 5:  Analysis Ex. (HLS Kenya)

 

bullet.jpg (717 bytes) One-Way Analysis    bullet.jpg (717 bytes) Targeting    bullet.jpg (717 bytes) Two-Way Analysis    bullet.jpg (717 bytes) Regression


 

Topics on this page:

bullet.jpg (717 bytes)  Uses of Regression    bullet.jpg (717 bytes)   Basic Equation       bullet.jpg (717 bytes)  Interaction    bullet.jpg (717 bytes)  Graphing      bullet.jpg (717 bytes)  Creating dichotomous variables                    
bullet.jpg (717 bytes)   Building a Regression Model to control for Confounders

 

Regression is used:

bullet.jpg (717 bytes)  with a continuous outcome (or dependent) variable

bullet.jpg (717 bytes)  with continuous and categorical (usually 0 and 1) determining (or independent) variables, in order to:

                        look for confounding

                        look for interactions

              confirm findings from tabulation, and vice versa

This section introduces simple regression using two independent variables. It will provide the fundamentals of regression analysis. Regression is especially powerful for exploring more than two independent variables, and it will be most useful for multi-way analysis in the following module.

The Basics of a Regression Equation

The following graph shows a simple (one independent variable) regression equation and where each of the variables is represented.

page4_11.jpg (20873 bytes)

Developing a Regression Equation

To start, it is a good idea to have a look at the associations of the independent variables and the outcome, using scatterplotting, that relates to your research question. Here is the question we will use for analysis in this section:

What is the relationship at the household level of irrigation and cultivation on
food sufficiency in Somalia?

                                 Here is an example of a scatterplot:

FScult.jpg (16148 bytes)

There is a positive association with the level of food sufficiency and the area of land under cultivation for the household.
This is shown by the slope of the fitted line (we will check in a moment to see if this is significant). This positive slope is reasonable, considering that generally the more land under cultivation the greater the amount of food produced.

Here, the fitted line shows the association between the outcome, food sufficiency, and the independent variable, area under cultivation (it is calculated by minimizing the square of the value of the sum of all the distances between the observation and the line -- hence, the least squares line). The line fits all of the points marked
on the graph so that the deviation squared from the line for all points combined is the smallest possible for that set of points. The deviations from the line are used to predict other features of the relationship between outcome and independent variables. One of these is the coefficient of determination (R-squared), which tells the amount of variability in the outcome that is due to the independent variable (percent of change in food sufficiency due to the area cultivated). An r-square of
0.0031 is telling us that very little of the variability in food sufficiency is explained by cultivation.

          Significance for cultivation (p-value) = 0.300, thus non-significant.

The same analysis can be done using irrigation:

FSirr99.jpg (24561 bytes)

The association seen here is also positive, but much stronger than the association found using area cultivated. The r-square is much larger (0.0855), but to test the significance of the association we must again use regression.

Not only is the r-square larger and the equation statistically significant (p-value < .05), the coefficient for area irrigated is
more than two times larger than the coefficient for cultivation (0.424) at 1.029, meaning that for every one unit rise in area irrigated (hectares) a corresponding 1.029 month increase in food sufficiency is estimated.

These results are interesting because the regression coefficient for irrigation is now an insignificant predictor of food sufficiency when controlling for cultivation. The coefficient for irrigation was also greatly reduced from 1.029 to 0.678 when cultivation was added to the model.  Before we can come to any conclusions from our findings, we first need to test for interaction between cultivation and irrigation.