Glossary of Terms
999: Most often the numbers 8 and 9 are used in codes for
‘missing information’. This can mean
that, for a survey question, the response may have been either been ‘no
response’, ‘not applicable’ or ‘don’t know’.
Usually the codes used are the top end of the allowable range, such as
using 999 for a 3-digit variable. If these missing value codes are not recoded
as missing information, the computer will include them as valid responses,
which then introduces major errors, such as artificially inflated means for
continuous data.
Anthropometry: The measurement of the human body for the
purpose of classification or comparison
Association: A relationship between two variables
where the value for one is dependent on the other
Bi-variate: Involving
two variables; bi-variate analysis is also referred to as two-way
analysis and looks to see if the two variables are related
Categorical Variable/Discrete Variable: Variables that have a distinct number of
categories that are usually arbitrarily assigned numbers to represent each
category; for example type of roof, water source, or delivery location.
Classifying Variable: Independent variable which identifies
groups within a population based on biological, social, physical, political,
economic, or other characteristics, these variables are used for targeting or sub-dividing
the population when exploring causality.
Coding errors: Since survey data collection requires the
work not only of the survey administrator, data entry person, and often a
translator, there are many opportunities for error in coding the responses to
the questions. Errors can occur during the data collection stage, in
recoding forms, or during data entry. If the results are not out of
range, then they may be difficult to detect in examining the
variable itself. The error may appear when bivariate associations and
possible outliers are examined by scatterplotting.
Column Shift: Errors in data entry or column shifts in
subsequent processing are often easy to detect, since there may be a response
such as 3 or 4
appearing to a question which only allows 1 or 2 as a valid response. Correction
of this error warrants having the actual completed survey forms available
during the data cleaning stage of analysis. Although rare, these errors can wreak
havoc with analysis if undetected. When they are detected in one
variable, other variables in the affected cases should also be carefully
checked.
Confounding: When examining the relationship between
two variables, the confounding effect happens when an additional variable is
associated with both the determining variable and the outcome variable and may
be the real reason for the relationship between the original two variables. These confounding variables must be
controlled for through regression analysis; otherwise results for associations
between two variables may be misleading, which is of course problematic when
planning interventions.
Continuous variable: Variables that lie on a continuum such as
age, weight, height, income and the like are all considered continuous
variables. These could take any numeric form, including decimals within a
logical range (e.g. age would not be negative).
Correlation Coefficient:
Coverage: Refers to the numbers of people receiving
the service of using the program and % out of the entire population targeted;
may also refer to the number and % of the population in need (e.g.
malnourished) covered by the program
Dependent Variable: Also known as outcome variables.
These are variables whose values may be 'dependent' upon a number of other
'independent' variables. In nutritional data, examples of dependent variables
are child weight and height and the anthropometric indices of weight-for-height
z-score, height-for-age z-score, and height-for-age z-score, as well as body
mass index for mothers, which can be dependent on variables such as
socio-economic status, water source, etc.
These variables in turn can be dependent on yet other factors. Part of your analysis will be determining
these dependent relationships.
Determining Variable: An independent variable such as
breastfeeding practice or water that can be used in determining a suitable
intervention
Dichotomous Variable: Variables that have two (di) forms
(chotomy) and therefore are usually responses to yes/ no questions or another
type of on/off type of response. For
example, there are only two answers to the question of whether a child is
breastfed from 0-4 months: yes or no.
Discrete variable: See Categorical
Variable.
Dummy variable: Useful for regression analysis, these are
variables that might represent several mutually exclusive categories, so that
the responses of the dummy variables can only take on a value of yes (1) for
one of the variables at once-- it would infer that all the other possible
responses therefore are 0. For example, the variable for water source
might have had 3 responses possible -pipe, well, or tap, but you recode this into
three new separate variables (dummy variables) that are now either yes or no
responses for each possible response (dichotomous). So at any one time,
the new variable dpipe could be a 1 for yes, but that would infer that well and
tap are both no (0), since they are mutually exclusive.
Fortification: Addition of nutrients to food
Independent Variable: Often called predictor, determining, or classifying
variables, these types of variables are often associated with the dependent
variables in a way that influences the values of outcomes such as child
anthropometry. Examples of independent
variables in nutritional data include parental education, socio-economic status
and female households, among others.
Intensity: Resources applied by the program: can be
measured as $ per head, field workers (e.g., mobilizers -- village workers,
often volunteers) per population covered, facilitators (i.e., supervisors,
often government or agency employees) per mobilizer, and other similar
indicators.
Interaction: The independent operation of two or more
causes to produce or prevent an effect. It can also be defined as a factor
having different effects depending on the levels of another factor.
Linear Regression: A type of analysis that can explore
associations between multiple independent variables and can also examine
confounding and interactions.
Linear Relationship: A relationship between two variables that
is represented in a graph by a straight line.
This means that for every unit of increase for one variable, there is a
corresponding increase for the second variable.
Mean: Commonly known as the average, this is the sum of all of the
values for a given variable divided by the total number of cases with a value
for that variable.
Median: The middle value in a distribution of values in a variable—the value at which 50% of cases are above and 50% are below. If the median differs much from the mean value, the overall distribution of values should be investigated.
Multivariate Analysis: Analysis
conducted between more than one variable, with the purpose of examining
the relationships between the variables. This level of analysis
allows for more complex relationships, such as confounding or
interactions, to be understood.
One-way Analysis: Analysis conducted with only one
independent variable, used for in Situation or Descriptive Analyses, targeting,
and the initial exploration of associations.
Prevalence: The number of cases or events, such as
illness, in a given population at a designated time, out of the total population,
expressed as a %. Underweight prevalence is the total % of children with<-2
weight-for-age standard deviations.
P-value: A measure of significance which gives the
probability that you would find a sample that gives your results, assuming the
null hypothesis is true (for example, the probability of your sample showing
that two groups are different if the two groups are actually the same in the
entire population). If p<.05, results
are considered statistically significant.
Range: The difference between the maximum and minimum values for a
given variable.
Regression Coefficients: These coefficients include:
·
Regression
coefficient B (the slope of the line in linear regression, ie estimated average
change in y per unit of x)
·
standard
error of B
·
standardized
coefficient beta (estimated average increase in y per standard deviation
increase in x)
·
t
value for B, and two-tailed significance level of t
R squared: Proportion of variance in y that can be
explained by x. If r^2=1.0, it is a perfect fit, and if 2^2=0.0 then x gives no
information about y (the determining variable gives no information about the
outcome). Adjusted R squared is a modified version of this that takes into
account the number of determining/predictor variables in relation to the number
of observations. It is always smaller
than R squared and can even take a negative value if x is a poor fit for
predicting y.
Sample Size: The number of observations, labeled as
‘n’.
Significance Level: Probability of rejecting the null
hypothesis if the null is true (the probability of assuming two groups are
different if in fact they are the same), often expressed by the p-value
Skew: a measure of asymmetry in the distribution of a range of
values

Standard Deviation: indicates the variation in the group of
measurements. When the values of a set of observation lie close to the mean,
the dispersion is less than when they are scattered over a wide range. A
difference in the standard deviation of a variable when comparing different
groups will indicate possible data errors.
Stunting: Anthropometric measurement of
malnutrition seen in linear growth, expressed in terms of height-for-age. This measurement captures chronic
malnutrition.
Targeting: Preferentially including a specific group
in a program (based on geography, vulnerability, etc.)
Underweight: Aggregate measurement of wasting and
stunting, expressed in terms of weight-for-age that can capture both chronic
and acute malnutrition.
Wasting: Anthropometric measurement of
malnutrition, expressed in terms of weight-for-height. This measurement
captures acute malnutrition.
Z-score: How many standard deviations an
observation falls above or below the mean, found by computing (the observed
value-the mean)/standard deviation.
Z-scores are used to determine whether a child is wasted, stunted or
underweight. See http://www.tulane.edu/~panda3/Analysis2/submods/zscores/zscores.htm
to learn how to calculate these z-scores from a child growth chart.