4^6
## [1] 4096
# There are 4 possibilities for each position, and they are drawn independently,
# so there are 4*4*4*4*4*4 = 4^6 = 4096 possibilities.
1/4^6
## [1] 0.0002441406
# Since there are 4096 possible 6-mers, you would expect to find a specific one
# around 1/4096th of the time.
# This is 2.4 sites per 10,000 bases, or one site every 4096 bases.
4^6
## [1] 4096
# One cut every 4096 bases on average translates to an expected fragment size of 4096.
# nPerm = n!
factorial(6) # 720
## [1] 720
4^22 # 1.76e+13
## [1] 1.759219e+13
# nPerm = n!/(n-k)!
factorial(12)/factorial(12-6) # 665280
## [1] 665280
# nComb = nPerm/k! = n!/k!(n-k)!
factorial(12)/( factorial(6) * factorial(12-6) ) # 924
## [1] 924
# your answer here
The number of combinations is smaller than the number of permutations by a factor of k! since all the permutations of the k-mer sequence get collapsed into one combination.
The frequency of a minor allele at the CFTR gene was found to be 0.17. What is the probability that any allele chosen at random from the population will NOT be that same allele?
0.17^0 * (1-0.17)^1
## [1] 0.83
\[ P(X=0) = f(0) = (0.17)^0(1-0.17)^1 = 0.83\]
Consider a genome in which the four bases \(A,C,G,T\) are present in equal proportions (this is actually rather uncommon, but let’s go with it for now).
For the following questions, think of them as “What is the probability of X and not Y?” (So, you are asking a binary question about one possible outcome vs. all others.)
If you were to pick a sequence of 10 nt completely at random, what is the chance that exactly 3 bases will be an \(A\) (or any other homopolymer)?
dbinom(3,10,.25)
## [1] 0.2502823
What is the chance that you will find less than 5 \(G\)s (i.e. 4 or fewer)?
pbinom(4,10,.25)
## [1] 0.9218731
What is the chance that you will find between 2 and 4 \(T\)s?
\[ P(2 \le X \le 4) = \sum_{k=2}^4 {10 \choose k}(0.25)^k(0.75)^{10-k} = \sum_{k=0}^4 {10 \choose k}(0.25)^k(0.75)^{10-k} - \sum_{k=0}^1 {10 \choose k}(0.25)^k(0.75)^{10-k}\]
# these are equivalent: a) sum up the probabilities for all values
sum(dbinom(2:4,10,.25))
## [1] 0.6778479
# b) get the cumulative mass up to and including 4, then
# subtract the cumulative up to 1 (to get 2 inclusive)
pbinom(4,10,.25) - pbinom(1,10,.25)
## [1] 0.6778479
The quantile function qbinom(p, size, prob)
returns the smallest value of \(q\) such that \(Pr(X \le q) \ge p\). The quantile is defined as the smallest value \(x\) such that \(F(x) \ge p\). For example, \(F(x) \ge 0.75\) means that 75% of the distribution is less than \(x\).
What is the maximum number of any single base you would expect to find in 75% of 10-mers picked at random from the genome?
qbinom(.75,10,.25)
## [1] 3
What is the maximum number of any single base you would expect to find in 99% of random 10-mers?
qbinom(.99,10,.25)
## [1] 6
Say you are analyzing the GC content across different regions of the genome and you find a region of 100 bases upstream of your favorite gene that has 63% CG content. How likely is it that this region has no significant CG bias?
First use the pbinom()
function to get a p-value, and then perform a two-sided test using binom.test()
.
## pbinom
# use P(X < or = x) for lower tail and P(X > x-1) for upper tail
# two-sided test is looking for values more extreme than the one observed
# at either end of the distribution
# lower.tail = 100-63 = 37
pbinom(37,100, 0.5) + pbinom(62,100, 0.5, lower.tail=FALSE) # lt + ut
## [1] 0.01203298
pbinom(62,100, 0.5, lower.tail=FALSE) * 2 # 2*upper tail
## [1] 0.01203298
## binom.test
# gives P(X <= n-x) + P(X >= x) for a two-sided test,
# so use the actual value you are looking for and select the type of test
binom.test(63,100, 0.5, alternative = c("two.sided"))
##
## Exact binomial test
##
## data: 63 and 100
## number of successes = 63, number of trials = 100, p-value = 0.01203
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.5276484 0.7244334
## sample estimates:
## probability of success
## 0.63
Are the results the same?
## Yes they are exactly the same. The `binom.test()` function performs an exact test and should give the same results as `pbinom()`.
# It is also easier to use because you don't have to remember to subtract one for an upper-tailed test.