Chapter 2

Page 3

M&E Home Page 1 Page 2 Page 3 Page 4


Experimental Study Designs

Experimental designs are generally considered the gold standards for all other study designs because they generally produce the most accurate results in comparison to all other study designs.  Experimental designs consist of two main types of study participants – an experimental or treatment group and a comparison or control group.  A treatment group is a group of participants to whom an intervention is delivered and whose outcome measures are compared with those of the control group.  A control group refers to a group of “untreated” study participants (people who do not receive the intervention) that are generally similar to the treatment group in all other respects except for the fact that they do not receive the intervention.  Control groups can be made up of the same mixes of persons or other units (identical composition), can be exposed to the same set and intensity of extraneous factors (identical experiences) or can be made up of individuals with identical or equal predisposition towards the program (that is, the attitude of the control group towards the program is the same as the attitude of the treatment group towards the program. This avoids bias and is particularly important for behavior-related interventions such as condom use).  Identical predisposition can also apply to access, for example, both treatment and control groups should have similar access to health care facilities, schools, etc.  Control groups can be chosen either by randomized assignment, through matching on key characteristics such as sex, age, neighborhood or village, education, etc., by randomly assigning observations as controls during the analysis phase of the evaluation (statistical controls) or by using multiple methods simultaneously (mixed methods).

Randomized Controlled Experiments  

A randomized experiment is a type of experimental design in which participants are randomly assigned to a treatment or control group.  This type of experimental design is typically the gold standard for impact evaluations because it often produces the least biased results.  Two commonly used study designs include:

1. Posttest-only control group design

2. Pretest-posttest control group design  

Posttest-only control group design  

With this type of randomized experimental design, it is assumed that the randomization has distributed extraneous factors (confounding factors) such as age, sex, proximity to a healthcare facility, etc., equally, and thus the treatment and control groups are equivalent at baseline.  No measurements are taken before the start of the intervention.

 

insert figure 2.3.1 here

X = Program intervention or introduction

O1 = Observations/measurements for the experimental group

O2 = Observations/measurements for the control group

The impact of the intervention is measured by using only the difference between the observations for the experimental group and the observations for the control group.  

Impact = (O1-O2) +/- Error (sampling errors due to study design)

 Pretest-posttest control group design

In this type of design, measurements are taken before and after the intervention so the researcher can subsequently correct for extraneous factors that may not be equally distributed across groups.

 insert figure 2.3.2 here

 X = Program intervention or introduction

O1 = Outcome measurements for pretest treatment group

O2 = Outcome measurements for the posttest treatment group

O3 = Outcome measurements for the pretest control group

O4 = Outcome measurements for the posttest control group  

The impact of the intervention is measured by using the overall difference between the observations for the experimental group and the observations for the control group.  

Impact = (O2 – O1) – (O4 – O3) +/- Error (sampling errors due to study design)

 Advantages and Disadvantages of Randomized Experiments

Advantages: Randomized experiments are versatile in the sense that they allow the researcher to assess the program impact at several levels in addition to the overall impact. For example, the researcher can assess the program impact at each stratification level, as well as the overall impact. They also have a high internal validity, that is, the results are representative of the study at large and similar results are produced each time the program is evaluated.  Fewer assumptions are also required because randomization produces equivalent treatment and control groups, external influences (for example, civil war, drought, etc.) affect both groups equally, all treatment groups receive the same intensity of treatment, and assignment to the experimental group does not in itself alter the behavior of the service providers or the study subjects.  And finally, randomized experiments are relatively simple to analyze in comparison to other study designs. 

Disadvantages: The problem with randomized experiments is that, depending on the type of intervention and whether or not it is beneficial to the treatment group, there can be political or ethical issues related to withholding the intervention from part of the community.  For example, in a vitamin A supplementation program for children where the program appears to improve the overall health of children receiving the supplementation, it would be unethical to continue withholding the intervention from the control group. To get around this, the researcher must come up with a plan to deliver the intervention to the control group once the intervention proves to be effective. Another problem with randomized experiments is that they are often costly and time-consuming and they sometimes lack generalizability (the applicability of the research findings to the program, or another program in a similar setting).  There are also some other threats to validity (although, in comparison to other study designs, these are much less) such as contamination of the control group (for example, in a mass media experiment, it is sometimes difficult to prevent the control group from receiving the intervention, especially if they have a radio or TV in the home), confounding external factors (that is, other organizations in the area rolling out similar programs, which may affect the impact of your specific program) and variations in treatment (that is, people in the treatment group receiving different levels or intensities of the intervention due to differences in communication from community health workers, etc.)

 

Example of a randomized controlled experiment – Grosskurth H, Mosha F, Todd J, et al. 1995.

Impact of improved treatment of sexually transmitted diseases on HIV infection in rural Tanzania:  randomized controlled trial. Lancet 346: 530-536

 Please click here for the full article. 

In this study, the intervention was twofold:

1. Improved screening and treatment of STD’s

2. Health education about STD’s

The study design was a randomized trial to evaluate the effect of improved STD treatment on HIV incidence. It covered a 24 month period – n =12,537 adults, 71% of whom completed the follow-up survey.  71% is an acceptable figure, but generally about 80% and above is better because when you lose many people during follow-up, differences between the experimental group and the control group that were taken into account during the design of the study can no longer be taken into account.  Thus it is important to have as many people as possible complete the follow-up survey in order to avoid bias and produce more accurate results.  Twelve communities were randomly assigned to treatment and control groups and matched pairs of treatment-control villages were created. A survey of about 1000 adults from each community was completed at baseline and follow-up.  The authors used such a large number of people to ensure that the power was high enough to detect even small changes.  Note that for this experiment, the main outcome measures are HIV sero-conversion rates, STD prevalence and selected sexual behaviors. As you read the article, be sure to take note of the problems that the authors encountered during the evaluation, some of which include loss to follow-up, which resulted in a loss of sufficient power to detect small changes, contamination, and problems with false positives and false negatives obtained during the HIV testing procedure.