Glossary of Terms

999: Most often the numbers 8 and 9 are used in codes for ‘missing information’.  This can mean that, for a survey question, the response may have been either been ‘no response’, ‘not applicable’ or ‘don’t know’.  Usually the codes used are the top end of the allowable range, such as using 999 for a 3-digit variable. If these missing value codes are not recoded as missing information, the computer will include them as valid responses, which then introduces major errors, such as artificially inflated means for continuous data. 

 

Anthropometry: The measurement of the human body for the purpose of classification or comparison

 

Association: A relationship between two variables where the value for one is dependent on the other

 

Bi-variate: Involving two variables; bi-variate analysis is also referred to as two-way analysis and looks to see if the two variables are related

 

Categorical Variable/Discrete Variable: Variables that have a distinct number of categories that are usually arbitrarily assigned numbers to represent each category; for example type of roof, water source, or delivery location.

 

Classifying Variable: Independent variable which identifies groups within a population based on biological, social, physical, political, economic, or other characteristics, these variables are used for targeting or sub-dividing the population when exploring causality. 

 

Coding errors: Since survey data collection requires the work not only of the survey administrator, data entry person, and often a translator, there are many opportunities for error in coding the responses to the questions.  Errors can occur during the data collection stage, in recoding forms, or during data entry.  If the results are not out of range, then they may be difficult to detect in examining the
variable itself.  The error may appear when bivariate associations and possible outliers are examined by scatterplotting.

 

Column Shift: Errors in data entry or column shifts in subsequent processing are often easy to detect, since there may be a response such as 3 or 4
appearing to a question which only allows 1 or 2 as a valid response. Correction of this error warrants having the actual completed survey forms available during the data cleaning stage of analysis. Although rare, these errors can wreak havoc with analysis if undetected.  When they are detected in one variable, other variables in the affected cases should also be carefully checked.

 

Confounding: When examining the relationship between two variables, the confounding effect happens when an additional variable is associated with both the determining variable and the outcome variable and may be the real reason for the relationship between the original two variables.  These confounding variables must be controlled for through regression analysis; otherwise results for associations between two variables may be misleading, which is of course problematic when planning interventions.

 

Continuous variable: Variables that lie on a continuum such as age, weight, height, income and the like are all considered continuous variables.  These could take any numeric form, including decimals within a logical range (e.g. age would not be negative). 

 

Correlation Coefficient: Measure of relatedness of 2 variables; the value of a correlation coefficient ranges between -1 and 1, so the closer the value to either 1 or -1, the stronger the relationship between the variables; a coefficient of 0 implies independence.  This can also be referred to as Pearson’s Correlation Coefficient.  Note that this measure is highly dependent upon sample size and should thus just be used in preliminary explorations of the relationships between variables.

 

Coverage: Refers to the numbers of people receiving the service of using the program and % out of the entire population targeted; may also refer to the number and % of the population in need (e.g. malnourished) covered by the program

 

Dependent Variable: Also known as outcome variables. These are variables whose values may be 'dependent' upon a number of other 'independent' variables. In nutritional data, examples of dependent variables are child weight and height and the anthropometric indices of weight-for-height z-score, height-for-age z-score, and height-for-age z-score, as well as body mass index for mothers, which can be dependent on variables such as socio-economic status, water source, etc.  These variables in turn can be dependent on yet other factors.  Part of your analysis will be determining these dependent relationships.

 

Determining Variable: An independent variable such as breastfeeding practice or water that can be used in determining a suitable intervention

 

Dichotomous Variable: Variables that have two (di) forms (chotomy) and therefore are usually responses to yes/ no questions or another type of on/off type of response.  For example, there are only two answers to the question of whether a child is breastfed from 0-4 months: yes or no.

 

Discrete variable: See Categorical Variable.

 

Dummy variable: Useful for regression analysis, these are variables that might represent several mutually exclusive categories, so that the responses of the dummy variables can only take on a value of yes (1) for one of the variables at once-- it would infer that all the other possible responses therefore are 0.  For example, the variable for water source might have had 3 responses possible -pipe, well, or tap, but you recode this into three new separate variables (dummy variables) that are now either yes or no responses for each possible response (dichotomous).  So at any one time, the new variable dpipe could be a 1 for yes, but that would infer that well and tap are both no (0), since they are mutually exclusive. 

 

Fortification: Addition of nutrients to food

 

Independent Variable: Often called predictor, determining, or classifying variables, these types of variables are often associated with the dependent variables in a way that influences the values of outcomes such as child anthropometry.  Examples of independent variables in nutritional data include parental education, socio-economic status and female households, among others.

 

Intensity: Resources applied by the program: can be measured as $ per head, field workers (e.g., mobilizers -- village workers, often volunteers) per population covered, facilitators (i.e., supervisors, often government or agency employees) per mobilizer, and other similar indicators.

 

Interaction: The independent operation of two or more causes to produce or prevent an effect. It can also be defined as a factor having different effects depending on the levels of another factor.

 

Linear Regression: A type of analysis that can explore associations between multiple independent variables and can also examine confounding and interactions.

 

Linear Relationship: A relationship between two variables that is represented in a graph by a straight line.  This means that for every unit of increase for one variable, there is a corresponding increase for the second variable.

 

Mean: Commonly known as the average, this is the sum of all of the values for a given variable divided by the total number of cases with a value for that variable.

 

Median: The middle value in a distribution of values in a variable—the value at which 50% of cases are above and 50% are below.  If the median differs much from the mean value, the overall distribution of values should be investigated.

Multivariate Analysis: Analysis conducted between more than one variable, with the purpose of examining the relationships between the variables.  This level of analysis allows for more complex relationships, such as confounding or interactions, to be understood.

 

One-way Analysis: Analysis conducted with only one independent variable, used for in Situation or Descriptive Analyses, targeting, and the initial exploration of associations.

 

Prevalence: The number of cases or events, such as illness, in a given population at a designated time, out of the total population, expressed as a %.  Underweight prevalence is the total % of children with<-2 weight-for-age standard deviations.

 

P-value: A measure of significance which gives the probability that you would find a sample that gives your results, assuming the null hypothesis is true (for example, the probability of your sample showing that two groups are different if the two groups are actually the same in the entire population).  If p<.05, results are considered statistically significant.

 

Range: The difference between the maximum and minimum values for a given variable.

 

Regression Coefficients: These coefficients include:

·         Regression coefficient B (the slope of the line in linear regression, ie estimated average change in y per unit of x)

·         standard error of B

·         standardized coefficient beta (estimated average increase in y per standard deviation increase in x)

·         t value for B, and two-tailed significance level of t

 

R squared: Proportion of variance in y that can be explained by x. If r^2=1.0, it is a perfect fit, and if 2^2=0.0 then x gives no information about y (the determining variable gives no information about the outcome).  Adjusted R squared is a modified version of this that takes into account the number of determining/predictor variables in relation to the number of observations.  It is always smaller than R squared and can even take a negative value if x is a poor fit for predicting y.

 

Sample Size: The number of observations, labeled as ‘n’.

 

Significance Level: Probability of rejecting the null hypothesis if the null is true (the probability of assuming two groups are different if in fact they are the same), often expressed by the p-value

 

Skew: a measure of asymmetry in the distribution of a range of values

Standard Deviation: indicates the variation in the group of measurements. When the values of a set of observation lie close to the mean, the dispersion is less than when they are scattered over a wide range.  A difference in the standard deviation of a variable when comparing different groups will indicate possible data errors. 

 

Stunting: Anthropometric measurement of malnutrition seen in linear growth, expressed in terms of height-for-age.  This measurement captures chronic malnutrition.

 

Targeting: Preferentially including a specific group in a program (based on geography, vulnerability, etc.)

 

Underweight: Aggregate measurement of wasting and stunting, expressed in terms of weight-for-age that can capture both chronic and acute malnutrition.

 

Wasting: Anthropometric measurement of malnutrition, expressed in terms of weight-for-height. This measurement captures acute malnutrition.

 

Z-score: How many standard deviations an observation falls above or below the mean, found by computing (the observed value-the mean)/standard deviation.  Z-scores are used to determine whether a child is wasted, stunted or underweight.  See http://www.tulane.edu/~panda3/Analysis2/submods/zscores/zscores.htm to learn how to calculate these z-scores from a child growth chart.