One variable | Two or more variables | |
---|---|---|
Exact | Binomial proportions | Fisher’s exact test |
Approximate | \(\chi^2\) goodness-of-fit | \(\chi^2\) test of independence |
Note: the binomial approximate test, or the \(z\)-test, can be used instead of the \(\chi^2\) test with the null hypothesis that one or more groups have the same or given proportions (probabilities of success). It is mathematically equivalent to the \(\chi^2\) test because they both use a normal approximation. However the \(\chi^2\) test is more general because it can be applied to multiple variables. In R, the \(z\)-test is called the “Test of Equal or Given Proportions” and is implemented as prop.test()
.
The exact tests should be used whenever possible, since they compute exact probabilities given all possible outcomes. However,
Approximate frequency tests do not provide accurate \(p\)-values when the frequency of one or more groups is low. Standard textbooks will tell you that before using a \(\chi^2\) test, you should check that the following assumptions are met:
When the above conditions are not met, several alternative options are available:
Note that the numbers given above are slightly arbitrary cutoffs. In general, exact tests are preferred whenever feasible.
The following diagram1 illustrates the P-value computed by an approximate test relative to the P-value from a binomial exact test as a function of sample size. This shows that even for sample sizes that meet the criteria for an approximate test, \(\chi^2\) (black) and a similar test called the “G-test” (green) provide estimates for P-values that are too small relative to an exact binomial P-value, thus inflating the apparent significance of the test.
To correct for this, the default behavior in R for \(\chi^2\) tests is to perform Yates’ continuity correction, which modifies the \((O-E)^2\) term as \((|O-E| - 1/2)^2\) to reduce the overall magnitude of the \(\chi^2\) test statistic. The term “continuity” correction is applied here because a continuous function (Chi-square) is being applied to approximate a discrete function (e.g. binomial).
There is some disagreement about when continuity correction should be applied, since in some cases it can result in overcorrection that results in a type II error (failure to correctly reject the null hypothesis). The Yates correction is strongly recommended when the expected number of observations in a cell is less than 5-10.
For these reasons, it is preferable to use an exact test whenever possible in order to get the most accurate measure of significance.
A common question that arises in biology is whether a response variable differs between groups. If so, we would also like to evaluate whether the probability distribution of one categorical variable is likely to be dependent on that of a second variable. For example:
In other words, are any observed differences are likely to occur by chance alone, due to variation among random samples?
If the differences are not random, we may also like to know the strength of the association, i.e. what is the magnitude of the difference? We will address the latter question in a future class.
Two kinds of analysis can be done to address such questions:
We will discuss measures of association for categorical data in the next class, and for continuous data a little bit later.
The \(\chi^2\) is the most common approximate test for independence and is widely used for contingency analysis. It can be used to test multiple categories with multiple possible outcomes and so is appropriate for a range of simple comparisons. Please refer to the lecture notes from the last class for a discussion of \(\chi^2\) tests.
Other approximate tests exist for experimental designs that are paired or that control for multiple measurements. We will not cover these here, but they are mentioned at the bottom of this document.
Fisher’s exact test uses the hypergeometric distribution (sampling without replacement from a finite population) to provide an exact \(p\)-value for contingency tables, by computing the cumulative probability of all possible scenarios that differ as much or more from a neutral expectation than the observed values.
We will discuss Fisher’s test in the next class.
Other flavors of tests exist that apply to different kinds of situations where frequency data need to be analyzed. We will not cover these here, but merely mention them in case you may want to use them in the future.
The \(G\)–test of goodness-of-fit (a.k.a. likelihood ratio test, log-likelihood ratio test, or \(G^2\) test) tests whether one nominal variable with two or more categories fits a theoretical expectation. Like the Chi-square test, it should be used with large sample sizes, since it will give inaccurate \(P\)-values for small sample sizes.
It turns out that experimental study design governs whether the true distribution of the data is hypergeometric or multinomial, and therefore whether Fisher’s is necessarily the right choice.
Barnard’s test is a non-parametric alternative to Fisher’s exact test. Because it is not conditioned on fixed margins (meaning it does not assume that the row and column totals are independent), Barnard’s exact test is reported to have greater power than Fisher’s exact test for 2x2 contingency tables. You can read about it on Wikipedia: Barnard’s test
An implementation of Barnard’s test is available in R: Blog on Barnard’s exact test
For more complicated scenarios, additional tests are available that we will not cover here. These include:
To summarize, R contains commands for computing statistics for comparing categorical variables to expected discrete distributions, as well as for testing for and measuring associations between categorical variables:
binom.test()
chisq.test()
fisher.test()
chisq.test()
Often our data will not be pre-formatted as a contingency table, so we will need to transform it in order to use these functions. The most common functions are table()
and xtable()
, which allow you to create 2x2 or larger tables. You can also cross-tabulate on one or more variables.
Some short tutorials for making tables and converting between data frames and tables are available here:
Handbook of Biological Statistics, by John H. McDonald↩︎
Handbook of Biological Statistics, by John H. McDonald↩︎