Consider two different experimental designs for determining whether high blood pressure is a risk factor for strokes:
The data might look something like this:
Design 1
HiBP | LoBP | |
---|---|---|
Stroke | 100 | 50 |
NoStroke | 600 | 1250 |
Design 2
HiBP | LoBP | |
---|---|---|
Stroke | 200 | 800 |
NoStroke | 50 | 950 |
Let’s compute both \(RR = \frac{p_{Stroke|HiBP}}{p_{Stroke|LoBP}}\) and \(OR = \frac{p_{stroke|HiBP}/(1-p_{Stroke|HiBP})}{p_{Stroke|LoBP}/(1-p_{Stroke|LoBP}}\) for both scenarios:
## Design 1 ================= #
a=data_matrix1[1,1]
b=data_matrix1[1,2]
c=data_matrix1[2,1]
d=data_matrix1[2,2]
p1 = a/(a+c)
p2 = b/(b+d)
o1 = a/c
o2 = b/d
RR = p1 / p2
#OR = (p1/(1-p1)) /(p2/(1-p2))
OR = (a/c) / (b/d) # equivalent
# p1
# p2
# o1
# o2
cat("Design 1:\n", "RR =",RR,"\n", "OR =",OR,"\n")
## Design 1:
## RR = 3.714286
## OR = 4.166667
## Design 2 ================= #
a=data_matrix2[1,1]
b=data_matrix2[1,2]
c=data_matrix2[2,1]
d=data_matrix2[2,2]
p1 = a/(a+c)
p2 = b/(b+d)
o1 = a/c
o2 = b/d
RR = p1 / p2
#OR = (p1/(1-p1)) /(p2/(1-p2))
OR = (a/c) / (b/d) # equivalent
# p1
# p2
# o1
# o2
cat("Design 2:\n", "RR =",RR,"\n", "OR =",OR,"\n")
## Design 2:
## RR = 1.75
## OR = 4.75
We can use the riskratio()
and oddsratio()
functions from the “epitools” package to compute the risk ratio automatically.
Note 1: Look at the documentation for this command. It expects a table in a different format than what we have here! We can specify to change this by setting the rev
parameter. We also need to transpose the table first.
library(epitools)
data_matrix1
## HiBP LoBP
## Stroke 100 50
## NoStroke 600 1250
t(data_matrix1)
## Stroke NoStroke
## HiBP 100 600
## LoBP 50 1250
riskratio(t(data_matrix1),correction = F, rev="both")
## $data
## NoStroke Stroke Total
## LoBP 1250 50 1300
## HiBP 600 100 700
## Total 1850 150 2000
##
## $measure
## NA
## risk ratio with 95% C.I. estimate lower upper
## LoBP 1.000000 NA NA
## HiBP 3.714286 2.678835 5.149969
##
## $p.value
## NA
## two-sided midp.exact fisher.exact chi.square
## LoBP NA NA NA
## HiBP 2.220446e-16 2.375474e-16 2.803567e-17
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio(t(data_matrix1),correction = F, rev="both")
## $data
## NoStroke Stroke Total
## LoBP 1250 50 1300
## HiBP 600 100 700
## Total 1850 150 2000
##
## $measure
## NA
## odds ratio with 95% C.I. estimate lower upper
## LoBP 1.000000 NA NA
## HiBP 4.157007 2.933727 5.961773
##
## $p.value
## NA
## two-sided midp.exact fisher.exact chi.square
## LoBP NA NA NA
## HiBP 2.220446e-16 2.375474e-16 2.803567e-17
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "median-unbiased estimate & mid-p exact CI"
riskratio(t(data_matrix2),correction = F, rev="both")
## $data
## NoStroke Stroke Total
## LoBP 950 800 1750
## HiBP 50 200 250
## Total 1000 1000 2000
##
## $measure
## NA
## risk ratio with 95% C.I. estimate lower upper
## LoBP 1.00 NA NA
## HiBP 1.75 1.614968 1.896322
##
## $p.value
## NA
## two-sided midp.exact fisher.exact chi.square
## LoBP NA NA NA
## HiBP 0 3.232811e-25 3.602083e-24
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio(t(data_matrix2),correction = F, rev="both")
## $data
## NoStroke Stroke Total
## LoBP 950 800 1750
## HiBP 50 200 250
## Total 1000 1000 2000
##
## $measure
## NA
## odds ratio with 95% C.I. estimate lower upper
## LoBP 1.000000 NA NA
## HiBP 4.735764 3.452523 6.61294
##
## $p.value
## NA
## two-sided midp.exact fisher.exact chi.square
## LoBP NA NA NA
## HiBP 0 3.232811e-25 3.602083e-24
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "median-unbiased estimate & mid-p exact CI"
The above method seems most straightforward to me logically … but you can also reverse the matrix beforehand, and then execute the commands without adding rev="both"
:
riskratio(rev(data_matrix1), correction=F)
oddsratio(rev(data_matrix1), correction=F)
riskratio(rev(data_matrix2), correction=F)
oddsratio(rev(data_matrix2), correction=F)
Note 2: Beware!!! There is something funny about the results when you reverse the matrix first and then feed it as a vector to riskratio()
, vs. reverse it and feed it as a matrix, or just transpose it. This can be really confusing!!!
library(epitools)
# compare different input formats============================================ #
data_matrix1
rev(data_matrix1) # reverses order as d-b-c-a
matrix(rev(data_matrix1), # flips flips a-d, b-c
nrow=2, ncol=2)
matrix(rev(data_matrix1), # flips a-d
nrow=2, ncol=2,
byrow = TRUE)
t(data_matrix1) # flips b-c
# risk ratio ================================================================ #
#riskratio(data_matrix1, correction=F, rev="both") # this is not correct
#riskratio(t(data_matrix1), correction=F) # this is also incorrect
riskratio(t(data_matrix1), correction=F, rev="both") # this works!
#riskratio(matrix(rev(data_matrix1), nrow=2, ncol=2), # this also does not work! ... even with rev="both"
# correction=F) # (default is byrow = FALSE)
# # then you need to traspose again, too complicated!
riskratio(rev(data_matrix1), correction=F) # this works!
# riskratio(matrix(rev(data_matrix1), # this works too ... need to make matrix by row
# nrow=2, ncol=2,
# byrow = TRUE),
# correction=F)
# riskratio(t(matrix(rev(data_matrix1), # this works, but too complicated!
# nrow=2, ncol=2, # you need to transpose the matrix also
# byrow = FALSE)), # (default is byrow = FALSE)
# correction=F)
# odds ratio ================================================================ #
oddsratio(t(data_matrix1), correction=F, rev="both")
oddsratio(t(data_matrix2), correction=F, rev="both")
We can see that for Design 1, the RR and OR are similar.
However, in the second case, RR is very different from OR.
In contrast, the OR is about the same in the two studies.
Now use Fisher’s test to get a \(p\)-value and OR estimate for these data. Try out both a 2-sided and a 1-sided test for both datasets (alternative="greater"
).
fisher.test(data_matrix1)
##
## Fisher's Exact Test for Count Data
##
## data: data_matrix1
## p-value = 2.375e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 2.892459 6.055446
## sample estimates:
## odds ratio
## 4.163592
fisher.test(data_matrix2)
##
## Fisher's Exact Test for Count Data
##
## data: data_matrix2
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 3.412951 6.704704
## sample estimates:
## odds ratio
## 4.746561
fisher.test(data_matrix1, alternative="greater")
##
## Fisher's Exact Test for Count Data
##
## data: data_matrix1
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
## 3.056374 Inf
## sample estimates:
## odds ratio
## 4.163592
fisher.test(data_matrix2, alternative="greater")
##
## Fisher's Exact Test for Count Data
##
## data: data_matrix2
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
## 3.585626 Inf
## sample estimates:
## odds ratio
## 4.746561
fisher.test(data_matrix1)$p.value
## [1] 2.375474e-16
fisher.test(data_matrix2)$p.value
## [1] 3.232811e-25
fisher.test(data_matrix1, alternative="greater")$p.value
## [1] 2.087913e-16
fisher.test(data_matrix2, alternative="greater")$p.value
## [1] 1.616406e-25
These are very similar to the results we got previously. The odds ratios differ slightly, though, possibly due to minor differences in rounding.
epitools
package is the same as a 2-sided fisher.test()
.p-value < 2.2e-16
, but you can get the precise value using the $p.value
notation.phyper()
functionNow use the hypergeometric distribution to get a \(p\)-value for these data. How do the results compare with your results above?
data_matrix1
## HiBP LoBP
## Stroke 100 50
## NoStroke 600 1250
data_matrix2
## HiBP LoBP
## Stroke 200 800
## NoStroke 50 950
# phyper(q, m, n, k, lower.tail = FALSE)
# phyper(a, a+b, c+d, a+c, lower.tail = FALSE)
# q = observation (a = stroke & high BP)
# m = white balls (a+b = stroke, a.k.a. "success")
# n = black balls (c+d = healthy, a.k.a. "failure")
# k = total draws (a+c = High BP)
# Design 1
a=data_matrix1[1,1]
b=data_matrix1[1,2]
c=data_matrix1[2,1]
d=data_matrix1[2,2]
phyper(a - 1, a+b, c+d, a+c, lower.tail = FALSE)
## [1] 2.087913e-16
# Design 2
a=data_matrix2[1,1]
b=data_matrix2[1,2]
c=data_matrix2[2,1]
d=data_matrix2[2,2]
phyper(a - 1, a+b, c+d, a+c, lower.tail = FALSE)
## [1] 1.616406e-25
These are exactly the same as the \(p\)-values for a one-sided Fisher’s test.
How do the values compare?
chisq.test(data_matrix1)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data_matrix1
## X-squared = 69.981, df = 1, p-value < 2.2e-16
chisq.test(data_matrix2)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data_matrix2
## X-squared = 101.49, df = 1, p-value < 2.2e-16
chisq.test(data_matrix1)$p.value
## [1] 5.987268e-17
chisq.test(data_matrix2)$p.value
## [1] 7.181476e-24
Again, very similar and highly significant results, but they are not exactly the same since this is an approximate test. Also note that the result is equivalent to a two-sided test, since you cannot perform a one-sided test with this method.