Is high blood pressure a risk factor for stroke?

Consider two different experimental designs for determining whether high blood pressure is a risk factor for strokes:

The data might look something like this:

Design 1

HiBP LoBP
Stroke 100 50
NoStroke 600 1250

Design 2

HiBP LoBP
Stroke 200 800
NoStroke 50 950

1) Compute RR and OR manually

Let’s compute both \(RR = \frac{p_{Stroke|HiBP}}{p_{Stroke|LoBP}}\) and \(OR = \frac{p_{stroke|HiBP}/(1-p_{Stroke|HiBP})}{p_{Stroke|LoBP}/(1-p_{Stroke|LoBP}}\) for both scenarios:

## Design 1 ================= #
a=data_matrix1[1,1]
b=data_matrix1[1,2]
c=data_matrix1[2,1]
d=data_matrix1[2,2]

p1 = a/(a+c)
p2 = b/(b+d)
o1 = a/c
o2 = b/d
RR = p1 / p2
#OR = (p1/(1-p1)) /(p2/(1-p2))
OR = (a/c) / (b/d) # equivalent

# p1
# p2
# o1
# o2

cat("Design 1:\n", "RR =",RR,"\n", "OR =",OR,"\n")
## Design 1:
##  RR = 3.714286 
##  OR = 4.166667
## Design 2 ================= #
a=data_matrix2[1,1]
b=data_matrix2[1,2]
c=data_matrix2[2,1]
d=data_matrix2[2,2]

p1 = a/(a+c)
p2 = b/(b+d)
o1 = a/c
o2 = b/d

RR = p1 / p2
#OR = (p1/(1-p1)) /(p2/(1-p2))
OR = (a/c) / (b/d) # equivalent

# p1
# p2
# o1
# o2

cat("Design 2:\n", "RR =",RR,"\n", "OR =",OR,"\n")
## Design 2:
##  RR = 1.75 
##  OR = 4.75

2) Use R commands to compute RR and OR

We can use the riskratio() and oddsratio() functions from the “epitools” package to compute the risk ratio automatically.

Note 1: Look at the documentation for this command. It expects a table in a different format than what we have here! We can specify to change this by setting the rev parameter. We also need to transpose the table first.

library(epitools)

data_matrix1
##          HiBP LoBP
## Stroke    100   50
## NoStroke  600 1250
t(data_matrix1)
##      Stroke NoStroke
## HiBP    100      600
## LoBP     50     1250
riskratio(t(data_matrix1),correction = F, rev="both")
## $data
##       NoStroke Stroke Total
## LoBP      1250     50  1300
## HiBP       600    100   700
## Total     1850    150  2000
## 
## $measure
##                         NA
## risk ratio with 95% C.I. estimate    lower    upper
##                     LoBP 1.000000       NA       NA
##                     HiBP 3.714286 2.678835 5.149969
## 
## $p.value
##          NA
## two-sided   midp.exact fisher.exact   chi.square
##      LoBP           NA           NA           NA
##      HiBP 2.220446e-16 2.375474e-16 2.803567e-17
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio(t(data_matrix1),correction = F, rev="both")
## $data
##       NoStroke Stroke Total
## LoBP      1250     50  1300
## HiBP       600    100   700
## Total     1850    150  2000
## 
## $measure
##                         NA
## odds ratio with 95% C.I. estimate    lower    upper
##                     LoBP 1.000000       NA       NA
##                     HiBP 4.157007 2.933727 5.961773
## 
## $p.value
##          NA
## two-sided   midp.exact fisher.exact   chi.square
##      LoBP           NA           NA           NA
##      HiBP 2.220446e-16 2.375474e-16 2.803567e-17
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "median-unbiased estimate & mid-p exact CI"
riskratio(t(data_matrix2),correction = F, rev="both")
## $data
##       NoStroke Stroke Total
## LoBP       950    800  1750
## HiBP        50    200   250
## Total     1000   1000  2000
## 
## $measure
##                         NA
## risk ratio with 95% C.I. estimate    lower    upper
##                     LoBP     1.00       NA       NA
##                     HiBP     1.75 1.614968 1.896322
## 
## $p.value
##          NA
## two-sided midp.exact fisher.exact   chi.square
##      LoBP         NA           NA           NA
##      HiBP          0 3.232811e-25 3.602083e-24
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio(t(data_matrix2),correction = F, rev="both")
## $data
##       NoStroke Stroke Total
## LoBP       950    800  1750
## HiBP        50    200   250
## Total     1000   1000  2000
## 
## $measure
##                         NA
## odds ratio with 95% C.I. estimate    lower   upper
##                     LoBP 1.000000       NA      NA
##                     HiBP 4.735764 3.452523 6.61294
## 
## $p.value
##          NA
## two-sided midp.exact fisher.exact   chi.square
##      LoBP         NA           NA           NA
##      HiBP          0 3.232811e-25 3.602083e-24
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "median-unbiased estimate & mid-p exact CI"

The above method seems most straightforward to me logically … but you can also reverse the matrix beforehand, and then execute the commands without adding rev="both":

riskratio(rev(data_matrix1), correction=F)
oddsratio(rev(data_matrix1), correction=F)

riskratio(rev(data_matrix2), correction=F)
oddsratio(rev(data_matrix2), correction=F)

Note 2: Beware!!! There is something funny about the results when you reverse the matrix first and then feed it as a vector to riskratio(), vs. reverse it and feed it as a matrix, or just transpose it. This can be really confusing!!!

library(epitools)

# compare different input formats============================================ #
data_matrix1
rev(data_matrix1)          # reverses order as d-b-c-a

matrix(rev(data_matrix1),  # flips flips a-d, b-c
       nrow=2, ncol=2)
matrix(rev(data_matrix1),  # flips a-d
       nrow=2, ncol=2,
       byrow = TRUE)
t(data_matrix1)            # flips b-c

# risk ratio ================================================================ #
#riskratio(data_matrix1, correction=F, rev="both")     # this is not correct
#riskratio(t(data_matrix1), correction=F)              # this is also incorrect

riskratio(t(data_matrix1), correction=F, rev="both")    # this works!

#riskratio(matrix(rev(data_matrix1), nrow=2, ncol=2),  # this also does not work! ... even with rev="both"
#          correction=F)                               # (default is byrow = FALSE)
#                                                      # then you need to traspose again, too complicated!

riskratio(rev(data_matrix1), correction=F)              # this works!

# riskratio(matrix(rev(data_matrix1),                   # this works too ... need to make matrix by row
#                  nrow=2, ncol=2,
#                  byrow = TRUE),
#           correction=F)

# riskratio(t(matrix(rev(data_matrix1),                  # this works, but too complicated!
#                    nrow=2, ncol=2,                     # you need to transpose the matrix also
#                    byrow = FALSE)),                    # (default is byrow = FALSE)
#          correction=F)

# odds ratio ================================================================ #
oddsratio(t(data_matrix1), correction=F, rev="both")
oddsratio(t(data_matrix2), correction=F, rev="both")

3) How do the RR and OR compare for these two scenarios?

We can see that for Design 1, the RR and OR are similar.

However, in the second case, RR is very different from OR.

In contrast, the OR is about the same in the two studies.

4) Use Fisher’s Exact Test

Now use Fisher’s test to get a \(p\)-value and OR estimate for these data. Try out both a 2-sided and a 1-sided test for both datasets (alternative="greater").

fisher.test(data_matrix1)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  data_matrix1
## p-value = 2.375e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  2.892459 6.055446
## sample estimates:
## odds ratio 
##   4.163592
fisher.test(data_matrix2)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  data_matrix2
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  3.412951 6.704704
## sample estimates:
## odds ratio 
##   4.746561
fisher.test(data_matrix1, alternative="greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  data_matrix1
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  3.056374      Inf
## sample estimates:
## odds ratio 
##   4.163592
fisher.test(data_matrix2, alternative="greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  data_matrix2
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  3.585626      Inf
## sample estimates:
## odds ratio 
##   4.746561
fisher.test(data_matrix1)$p.value
## [1] 2.375474e-16
fisher.test(data_matrix2)$p.value
## [1] 3.232811e-25

fisher.test(data_matrix1, alternative="greater")$p.value
## [1] 2.087913e-16
fisher.test(data_matrix2, alternative="greater")$p.value
## [1] 1.616406e-25

These are very similar to the results we got previously. The odds ratios differ slightly, though, possibly due to minor differences in rounding.

5) Use phyper() function

Now use the hypergeometric distribution to get a \(p\)-value for these data. How do the results compare with your results above?

data_matrix1
##          HiBP LoBP
## Stroke    100   50
## NoStroke  600 1250
data_matrix2
##          HiBP LoBP
## Stroke    200  800
## NoStroke   50  950
# phyper(q, m, n, k, lower.tail = FALSE)
# phyper(a, a+b, c+d, a+c, lower.tail = FALSE)
#  q = observation (a   = stroke & high BP)
#  m = white balls (a+b = stroke, a.k.a. "success")
#  n = black balls (c+d = healthy, a.k.a. "failure")
#  k = total draws (a+c = High BP)

# Design 1
a=data_matrix1[1,1]
b=data_matrix1[1,2]
c=data_matrix1[2,1]
d=data_matrix1[2,2]
phyper(a - 1, a+b, c+d, a+c, lower.tail = FALSE)
## [1] 2.087913e-16
# Design 2
a=data_matrix2[1,1]
b=data_matrix2[1,2]
c=data_matrix2[2,1]
d=data_matrix2[2,2]
phyper(a - 1, a+b, c+d, a+c, lower.tail = FALSE)
## [1] 1.616406e-25

These are exactly the same as the \(p\)-values for a one-sided Fisher’s test.

6) Compute the significance using \(\chi^2\) tests

How do the values compare?

chisq.test(data_matrix1)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data_matrix1
## X-squared = 69.981, df = 1, p-value < 2.2e-16
chisq.test(data_matrix2)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data_matrix2
## X-squared = 101.49, df = 1, p-value < 2.2e-16
chisq.test(data_matrix1)$p.value
## [1] 5.987268e-17
chisq.test(data_matrix2)$p.value
## [1] 7.181476e-24

Again, very similar and highly significant results, but they are not exactly the same since this is an approximate test. Also note that the result is equivalent to a two-sided test, since you cannot perform a one-sided test with this method.