|
|
Statistics might sound daunting, or it might not. Either way, after using PANDA the use of statistics should seem more practical and the basic concepts should be ready at your fingertips for future use. In order to make a smooth transition, this section will provide some statistics background that will be useful for PANDA modules such as the Two-way and Multi-way Analysis. Hope this makes things a lot easier.
When performing analysis to answer a research question, it is important to first identify the types of variables that will be used and choosing an outcome variable and one or more potential "independent" or determining variables. Once this is done, you must decide how you would like to use these in a statistical test to see if a relationship exists. The table below gives an idea of how to choose the appropriate test to use for statistical analysis depending on the variables you have chosen:
| PREDICTOR VARIABLE (S) |
OUTCOME VARIABLE |
|
| Categorical | Continuous | |
| Categorical | Chi Square, Log linear, Logistic | t-test, ANOVA (Analysis of Varirance), Linear regression |
| Continuous | Logistic regression | Linear regression, Pearson correlation |
| Mixture of Categorical and Continuous | Logistic regression | Linear regression, Analysis of Covariance |
Categorical Data Analysis, Janet C. Rice, Tulane University School of Public Health and Tropical Medicine Department of Biostatistics
All of the methods for analysis that are marked in BLUE are used in the PANDA package to perform analysis. Recall that terms marked in RED are defined in the glossary. Clicking on that word will take you to it's definition
Chi-Square -
Normally continuous outcome variables are used for anthropometry (e.g. wt/age z-score), but a categorical variable (e.g. malnourished yes/no for individual cases) is sometimes useful. For clinical signs, such as goitre, the data start as categorical variables. The chi-square test might be used any time the cross-tabulation function is used in SPSS. Chi-square is used to look at the statistical significance of an association between a categorical outcome (such as wasted or not wasted) and a categorical determining variable (such as diarrhea in the last two weeks, no diarrhea). When running the cross-tab, an option is available to test the significance of the association using chi-square. According to SPSS Version 8.0, Chi-Square (Cross tabs) Tests the hypothesis that the row and column variables are independent, without indicating strength or direction of the relationship. Pearson chi-square, likelihood-ratio chi-square, and linear-by-linear association chi-square are displayed. For 2x2 tables, Fisher's exact test is computed when a table that does not result from missing rows or columns in a larger table has a cell with an expected frequency of less than 5. Yates' corrected chi-square is computed for all other 2x2 tables.
ANOVA (Analysis of Variance)-
ANOVA is used to see an association between a continuous outcome variable (such as mean HAZ score) and a categorical determining variable (such as iodized salt consumption). The ANOVA is an option under the SPSS 8.0 function Statistics, Compare Means, Means which runs the mean outcome variable in categorized groups. The ANOVA is a Statistics option under the Means function that allows for testing the difference between the mean outcome scores for the two or more categories of the determining variable. According to SPSS 8.0, Analysis of Variance, or ANOVA, is a method of testing the null hypothesis that several group means are equal in the population, by comparing the sample variance estimated from the group means to that estimated within the groups.
One -way ANOVA- According to SPSS 8.0, The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitative dependent variable by a single factor (independent) variable. Analysis of variance is used to test the hypothesis that several means are equal. This technique is an extension of the two-sample t test.
Two-way ANOVA (Analysis of Covariance using the GLM function)- (Analysis of Covariance using the GLM function)- According to SPSS 8.0, The GLM General Factorial procedure provides regression analysis and analysis of variance for one dependent variable by one or more factors and/or variables. The factor variables divide the population into groups. Using this General Linear Model procedure, you can test null hypotheses about the effects of other variables on the means of various groupings of a single dependent variable. You can investigate interactions between factors as well as the effects of individual factors, some of which may be random. In addition, the effects of covariates and covariate interactions with factors can be included. For regression analysis, the independent (predictor) variables are specified as covariates.
Linear Regression-
Linear regression is used quite often in PANDA in order to preserve the
continuous nutrition outcome (often z-scores) and to test the relationship of this outcome
with a combination of continuous and categorical determining variables (such as illness,
feeding practices including breastfeeding, environmental influences, SES, and care
practices among others). According to SPSS 8.0, Linear Regression estimates
the coefficients of the linear equation, involving one or more independent variables, that
best predict the value of the dependent variable. For example, you can try to predict a
salesperson's total yearly sales (the dependent variable) from independent variables such
as age, education, and years of experience. Regression Coefficients. Estimates displays
Regression coefficient B, standard error of B, standardized coefficient beta, t value for
B, and two-tailed significance level of t. Confidence intervals displays 95% confidence
intervals for each regression coefficient, or a covariance matrix. Model fit. The
variables entered and removed from the model are listed, and the following goodness-of-fit
statistics are displayed: multiple R, R2 and adjusted R2, standard error of the estimate,
and an analysis-of-variance table.
R squared change. Displays changes in R**2 change, F change, and the significance
of F change.
Pearson Correlation
Correlation testing usually runs a continuous outcome (such as weight for age z-score) against a continuous determining variable (such as family income) to see if they have a linear relationship (positive, negative or none). This type of test will not tell the strength of the relationship between the variables, but will indicate the existence of the relationship. According to SPSS 8.0, the Bivariate Correlations procedure computes Pearson’s correlation coefficient, Spearman’s rho and Kendall’s tau-b with their significance levels. Correlations measure how variables or rank orders are related. Before calculating a correlation coefficient, screen your data for outliers (which can cause misleading results) and evidence of a linear relationship. Pearson’s correlation coefficient is a measure of linear association. Two variables can be perfectly related, but if the relationship is not linear, Pearson’s correlation coefficient is not an appropriate statistic for measuring their association.
T-Test
A t-test looks at the difference in means of a continuous variable between two groups. Remember that the null hypothesis Ho has no difference in the means (i.e., µ1= µ2) and the alternative hypothesis has a difference in the means. Remember that the p-value (significant at <0.05) is the probability that you would find the answer you have (i.e. the difference in means) given that the null hypothesis is true.