Hypothesis Testing with Categorical Data

Several different tests that can be applied to binomial and other categorical data have been covered in this topic. The amount of information may be somewhat overwhelming, so you may feel uncertain which test to apply in which situation. Here, we will settle on a general framework how to decide on a test to use.

We will apply each test to the 2-sided vs 3-sided fidget spinner data we collected last time. As a reminder, each element in the vector is the number of successes (pink side up) in 12 trials (spins) in the hands of one person.

spinner2 <- c(8, 8, 4, 5, 9, 5, 5, 6, 7, 8, 6, 7, 4, 4, 3)

spinner3 <- c(3, 5, 6, 3, 1, 6, 2, 5, 6, 2, 4, 5, 3, 4)

Very important! Most tests do not deal with replicates - they are designed to compare two observed frequencies or an observed frequency to an expected frequency. At the end, we will see how to incorporate the information about replicates into hypothesis testing. But for now, let us simply sum up our observations. It is also useful to arrange our data as a contingency table.

# sum up observations
spinner2_successes <- sum(spinner2)
spinner2_trials <- 12 * length(spinner2)
spinner2_successes

## [1] 89

spinner2_trials

## [1] 180

spinner3_successes <- sum(spinner3)
spinner3_trials <- 12 * length(spinner3)
spinner3_successes

## [1] 55

spinner3_trials

## [1] 168

# arrange as a contingency table
spinner_table <- matrix(data = c(spinner2_successes, spinner3_successes,
                                 spinner2_trials - spinner2_successes, spinner3_trials - spinner3_successes),
                        nrow = 2,
                        ncol = 2,
                        byrow = TRUE)
rownames(spinner_table) <- c("pink", "not_pink")
colnames(spinner_table) <- c("2-sided", "3-sided")
spinner_table

##          2-sided 3-sided
## pink          89      55
## not_pink      91     113

1. Exact tests - sometimes a little slow, but should be your default option

1.1. Comparing an observed binomial frequency to a theoretically expected one? Use Exact Binomial Test.

# is the success frequency observed with 2-sided spinner likely if the probability of success is 0.5?
binom.test(x = spinner2_successes,
           n = spinner2_trials,
           p = 0.5)

## 
##  Exact binomial test
## 
## data:  spinner2_successes and spinner2_trials
## number of successes = 89, number of trials = 180, p-value = 0.9406
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4192539 0.5698210
## sample estimates:
## probability of success 
##              0.4944444

# is the success frequency observed with 3-sided spinner likely if the probability of success is 0.5?
binom.test(x = spinner3_successes,
           n = spinner3_trials,
           p = 0.5)

## 
##  Exact binomial test
## 
## data:  spinner3_successes and spinner3_trials
## number of successes = 55, number of trials = 168, p-value = 9.02e-06
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.2570773 0.4039006
## sample estimates:
## probability of success 
##               0.327381

# is the success frequency observed with 3-sided spinner likely if the probability of success is 1/3?
binom.test(x = spinner3_successes,
           n = spinner3_trials,
           p = 1/3)

## 
##  Exact binomial test
## 
## data:  spinner3_successes and spinner3_trials
## number of successes = 55, number of trials = 168, p-value = 0.9348
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 95 percent confidence interval:
##  0.2570773 0.4039006
## sample estimates:
## probability of success 
##               0.327381

1.2. Comparing two or more observed frequencies? Use Fisher’s Exact Test.

# is the probability of success with 2-sided and 3-sided spinner likely the same?
fisher.test(spinner_table)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  spinner_table
## p-value = 0.001638
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.271603 3.180646
## sample estimates:
## odds ratio 
##   2.005317

# multiply the values in your table by some large number (e.g. 10,000) and note the amount of time it takes to compute
fisher.test(10000*spinner_table)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  10000 * spinner_table
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  2.000589 2.018043
## sample estimates:
## odds ratio 
##   2.009406

In practice, you will rarely encounter situations where Fisher’s exact test is too slow. Therefore, stick to this test whenever you can!

2. Approximate tests - lightning-fast, reviewer #2 loves them, but not exact

2.1. Comparing an observed frequency to a theoretically expected one? Use Test of Given Proportions or Chi-Squared Goodness-of-Fit Test.

Note: test of given proportions (prop.test()) only works with binomial data, while chi-squared goodness-of-fit test (chisq.test()) can take multinomial data, too.

# is the success frequency observed with 2-sided spinner likely if the probability of success is 0.5?
prop.test(x = spinner2_successes,
          n = spinner2_trials,
          p = 0.5)

## 
##  1-sample proportions test with continuity correction
## 
## data:  spinner2_successes out of spinner2_trials, null probability 0.5
## X-squared = 0.0055556, df = 1, p-value = 0.9406
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.4195742 0.5695555
## sample estimates:
##         p 
## 0.4944444

chisq.test(x = c(spinner2_successes, spinner2_trials - spinner2_successes),
           p = c(0.5, 0.5))

## 
##  Chi-squared test for given probabilities
## 
## data:  c(spinner2_successes, spinner2_trials - spinner2_successes)
## X-squared = 0.022222, df = 1, p-value = 0.8815

prop.test(x = spinner2_successes,
          n = spinner2_trials,
          p = 0.5,
          correct = FALSE) # switch off continuity correction to obtain results identical to chi-sq

## 
##  1-sample proportions test without continuity correction
## 
## data:  spinner2_successes out of spinner2_trials, null probability 0.5
## X-squared = 0.022222, df = 1, p-value = 0.8815
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.4222885 0.5668325
## sample estimates:
##         p 
## 0.4944444

# is the success frequency observed with 3-sided spinner likely if the probability of success is 0.5?
prop.test(x = spinner3_successes,
          n = spinner3_trials,
          p = 0.5)

## 
##  1-sample proportions test with continuity correction
## 
## data:  spinner3_successes out of spinner3_trials, null probability 0.5
## X-squared = 19.339, df = 1, p-value = 1.094e-05
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.2582245 0.4045747
## sample estimates:
##        p 
## 0.327381

chisq.test(x = c(spinner3_successes, spinner3_trials - spinner3_successes),
           p = c(0.5, 0.5))

## 
##  Chi-squared test for given probabilities
## 
## data:  c(spinner3_successes, spinner3_trials - spinner3_successes)
## X-squared = 20.024, df = 1, p-value = 7.648e-06

# is the success frequency observed with 3-sided spinner likely if the probability of success is 1/3?
prop.test(x = spinner3_successes,
          n = spinner3_trials,
          p = 1/3)

## 
##  1-sample proportions test with continuity correction
## 
## data:  spinner3_successes out of spinner3_trials, null probability 1/3
## X-squared = 0.0066964, df = 1, p-value = 0.9348
## alternative hypothesis: true p is not equal to 0.3333333
## 95 percent confidence interval:
##  0.2582245 0.4045747
## sample estimates:
##        p 
## 0.327381

chisq.test(x = c(spinner3_successes, spinner3_trials - spinner3_successes),
           p = c(1/3, 2/3))

## 
##  Chi-squared test for given probabilities
## 
## data:  c(spinner3_successes, spinner3_trials - spinner3_successes)
## X-squared = 0.026786, df = 1, p-value = 0.87

2.2. Comparing two or more observed frequencies? Use Test of Equal Proportions or Chi-Squared Test of Independence.

Note: test of equal proportions (prop.test()) only works with binomial data, while chi-squared test of independence (chisq.test()) can take multinomial data, too.

# is the probability of success with 2-sided and 3-sided spinner likely the same?
prop.test(spinner_table)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  spinner_table
## X-squared = 9.3216, df = 1, p-value = 0.002265
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.06141014 0.28254411
## sample estimates:
##    prop 1    prop 2 
## 0.6180556 0.4460784

chisq.test(spinner_table)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  spinner_table
## X-squared = 9.3216, df = 1, p-value = 0.002265

# multiply the values in your table by some large number (e.g. 10,000) and note the amount of time it takes to compute
prop.test(10000*spinner_table)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  10000 * spinner_table
## X-squared = 99984, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1709301 0.1730242
## sample estimates:
##    prop 1    prop 2 
## 0.6180556 0.4460784

chisq.test(10000*spinner_table)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  10000 * spinner_table
## X-squared = 99984, df = 1, p-value < 2.2e-16

3. Binomial regression - when replicates matter

Simplistically, here is the idea: first, we plot the number of successes in the two groups that we want to compare. Then, we draw a straight line through the data and determine whether the line is parallel to one of the axes (two groups are not different) or whether it has a significant slope (two groups are different). We can use the glm() function for that. In reality, it “draws a line” in somewhat different coordinates than the ones we are going to plot, but this is not very important right now. We will cover the topic of generalized linear models later.

# arrange data in a data.frame (long format)
spinner_long <- data.frame("successes" = c(spinner2, spinner3),
                           "trials" = rep(12, length(spinner2) + length(spinner3) ),
                           "spinner_type" = c(rep(2, length(spinner2) ),
                                              rep(3, length(spinner3) ) ) )

# plot number of successes on the Y and separate the two spinners along the X
set.seed(1) # set seed to make jitter reproducible
ggplot(data = spinner_long,
       mapping = aes(x = spinner_type,
                     y = successes)) +
  geom_jitter(height = 0,
              width = 0.02) +
  ylim(c(0,12))

# perform regression
spinner_long$failures <- spinner_long$trials - spinner_long$successes
spinner_regression <- glm(formula = cbind(successes, failures) ~ spinner_type,
                          data = spinner_long,
                          family = "binomial")
summary(spinner_regression)

## 
## Call:
## glm(formula = cbind(successes, failures) ~ spinner_type, family = "binomial", 
##     data = spinner_long)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.01838  -0.58490   0.03849   0.64632   1.80951  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)   
## (Intercept)    1.3734     0.5551   2.474  0.01335 * 
## spinner_type  -0.6978     0.2219  -3.144  0.00167 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 40.249  on 28  degrees of freedom
## Residual deviance: 30.182  on 27  degrees of freedom
## AIC: 115.95
## 
## Number of Fisher Scoring iterations: 4

The Coefficients table gives the significance for the intercept and slope of the best fit line. We are interested in the slope being different from horizontal, and the p-value for this is given in the last column called Pr(>|z|). You should look at the p-value next to the slope term (0.00167 ** in our case).

Finally, let us add the regression line to the plot to visualize what the regression analysis has done.

# try to ignore the nightmarish expression inside geom_line()
# if you are still curious, the linear regression is not
# "successes = a * spinner_type + b"
# but
# "ln(p/(1-p)) = a * spinner_type + b", where p is the probability of success
# so, we need to calculate p from the equation and then multiply it by the number of trials (12)
# to determine the y value (number of successes in 12 trials) in our plot
set.seed(1) # set seed to make jitter reproducible
ggplot(data = spinner_long,
       mapping = aes(x = spinner_type,
                     y = successes)) +
  geom_line(mapping = aes(y = 12*(exp(1)^(spinner_type*spinner_regression$coefficients[2]+spinner_regression$coefficients[1]))/(1+exp(1)^(spinner_type*spinner_regression$coefficients[2]+spinner_regression$coefficients[1]))),
             size = 5,
             color = "lightblue") +
  geom_jitter(height = 0,
              width = 0.02) +
  ylim(c(0,12))

Please note that binomial regression does not perform well with all data (e.g. data with many zeros can be problematic), and you may need to perform other kinds of regression analysis. This is beyond the scope of this exercise, but you can google something like “zero inflated regression models” if you are in a dire need to analyze data where binomial regression fails.

Binomial Distributions Pt. II: Hypothesis Testing

Bogdan Sieriebriennikov

October 8, 2021