# One-Way Analysis of Variance

| Home | | Advanced Mathematics |

## Chapter: Biostatistics for the Health Sciences: One-Way Analysis of Variance

The analysis of variance is a comparison of different populations in studies that have several treatments or conditions.

One-Way Analysis of Variance

The analysis of variance is a comparison of different populations in studies that have several treatments or conditions. For example, we may want to compare mean scores from three or more populations that represent three or more study conditions. Remember that we used the Z test or t test to compare two populations, as in com-paring an experimental group with a control group. The analysis of variance will enable us to extend the comparison to more than two groups.

In this text, we will consider only the one-way analysis of variance (ANOVA). Typically, ANOVA is used to compare population means (μ’s) that represent inter-val- or ratio-level measurement. In the one-way analysis of variance, there is a sin-gle factor (such as classification according to treatment group) that differentiates the groups.

Other types of analyses of variance are also important in statistics. ANOVA may be extended to two-way, three-way, and N-way designs. To illustrate, the two-way analysis would examine the effects of two variables, such as treatment group and age group, on an outcome variable. The N-way ANOVAs are used in experimental studies that have multiple factorial designs. However, the problem of assessing the associations of several variables with an outcome variable becomes daunting.

One common use of the two-way analysis of variance is the randomized block design. In this design, one factor could be the treatment and the other would be the blocks. Blocks refer to homogeneous groupings of subsets of subjects; for example, subsets defined by race or other demographic characteristics. These characteristics, when uncontrolled, may increase the size of the error variance. In the randomized block design, we look for treatment effects and block effects, both of which are called the main effects. There is also the possibility of considering interaction effects between the treatments and the blocks. Interaction means that certain combi-nations of treatments and blocks may have greater or smaller impact on the out-come than do than the sum of their main effects. As is true of regression, the analysis of variance, which represents an important area in applied statistics, is the subject of entire books.

Scheffe (1959) wrote the classic theoretical text on analysis of variance. Fisher and McDonald (1978) authored a more recent text, which provides an advanced treatment of fixed effects designs (as opposed to random effects). Other, less ad-vanced, treatments can be found in Hocking (1985), Dunn and Clark (1974), and Miller (1986).

In statistical computer packages, the analysis of variance can be treated as a re-gression problem with dummy variables. A dummy variable is a type of dichoto-mous variable created by recoding the classifications of a categorical variable. For example, a single category of race (e.g., African American) would be coded as pre-sent (1) or absent (0). In the case of a regression problem, we may regard an ANO-VA as a type of linear model. Such a linear model (called the general linear model) can employ a mix of categorical and continuous variables to describe a relationship between them and a response variable. You may often see this type of analysis re-ferred to as analysis of covariance. All these models have the decomposition of variance of the response Y into proportions explained by the predictor variables. This is the so-called ANOVA that we will describe in this chapter.

In Chapter 12 we discussed R2, which is a ratio of the part of the variance in the response variable Y that is explained by the regression equation divided by the total variance of the response variable Y. In the ANOVA table (refer to Appendix A), we will see the case of an F test in which at least one of the means of a response vari-able is different from the other means. There is a direct mathematical relationship between this F statistic and R2.

In Chapter 12, we emphasized simple linear regression and correlation and briefly touched on multiple regression by giving one example. Analogously, multi-way analysis of variance is similar to multiple linear regression, in that there are two or more categorical variables in the model to explain the response Y. We will not go into the details here; the interested reader can consult some of the texts listed in Section 13.7.