When we perform experiments, we are doing “random trials” with two or more possible “outcomes” that cannot be known with certainty in advance of measurement or observation.
Different possible outcomes are called “events”. This might be something like “Probability that an individual carries allele X”, or “Brood size is between 20-50% of WT”, etc.
When we want to evaluate our results, for example in hypothesis testing, we want to know the frequency with which different outcomes are obtained. This is a probability.
In order to compute probabilities we need to be able to enumerate collections of all possible outcomes and have a way to determine what proportion of the time we expect to observe different potential outcomes. This is the realm of set theory.
The set of possible outcomes, or the sample space, is the universal set \(S\).
For discrete data, we can enumerate the elements of the set in curly brackets:
\[S = \{x_1, x_2, ... x_n\}, \ \ x_i \in S\]
where \(n\) is the total number of possible distinct outcomes, and each \(x_i\) is an element of \(S\).
An event is any plausible subset of possible outcomes. For example, \(A = \{x_1\}\), \(A = \{x_1, x_2\}\), and \(A = \{x_1, x_2,x_3\}\) are different events, and all of these are subsets of \(S\).
To illustrate, say you have made a genetic cross to get a 3xFLAG-tagged transgene into a mutant background. You cross a heterozygous 3xFLAG strain to your homozygous mutant line and test 8 progeny by PCR to see which ones have the 3xFLAG tag in them.
The probability is the proportion of random trials with a particular outcome. The probability of event \(A\) is its frequency of occurrence, relative to all other outcomes:
\[Pr[A] = \frac{N[A]}{N}\]
where \(N[A]\) is the number of times \(A\) was observed, and \(N\) is the total number of observations.
\(\Rightarrow\) Q: What the probability of any single PCR reaction being 3xFLAG positive among the progeny of the cross?
When a process can give rise to two or more outcomes, some very simple rules govern the relationships between them. These can be visualized using Venn diagrams:
Two sets are disjoint if they have no shared elements. Similarly, two events are mutually exclusive if they cannot occur at the same time (for example, a particular allele cannot be both A1 and A2, and any individual can have only one blood type).
Co-occurrence represents the intersection of two sets. It is represented by the \(AND\) or operation, and its mathematical symbol, \(\cap\), looks like an upside down U.
for mutually exclusive events, the probability that both events occur at the same time is zero*:
\[Pr[A \ \ AND\ \ B] = Pr[A \cap B] = 0\]
Otherwise, for non-disjoint sets, the probability of co-occurrence must be greater than zero:
\[Pr[A \ \ AND\ \ B] = Pr[A \cap B] > 0\]
\(\Rightarrow\) Question: How does mutual exclusion relate to probability distributions?
If two events are unrelated, and knowing something about one gives no information about the other, then they are independent. In this case, the probability of both events occurring is the product of their individual probabilities:
\[Pr[A \ \ AND\ \ B] = Pr[A \cap B] = Pr[A]*Pr[B]\]
Many combinations of genetic traits are independent (e.g. Mendel’s green and wrinkly peas).
Sometimes, however, two genes will show a genetic interaction, in which case the independence rule is violated. When this happens, a functional association between these genes will show up as a deviation from independence.
\(\Rightarrow\) Q: How does the law of independence relate to the distribution of allele combinations for 2 independently assorting genes with 5 alleles each, where each allele is equally likely to occur in the population?
In this case, the probability of seeing any two alleles together is 1/5*1/5 = 1/25.
\(\Rightarrow\) Q: Can two mutually exclusive events be independent?
\(\Rightarrow\) Q: How common is it for two overlapping sets to be independent?
The union of two sets represents the probability that either or both outcome has occurred. The probability of \(A \ OR \ B\) is represented by the union symbol, \(\cup\), so the union of A and B is \(A \cup B\).
To find the probability of \(A \cup B\), we need to subtract the probability that they are co-occurring from the total probability of each individual event (or else we would be counting the overlap twice).
\[Pr[A \ \ OR\ \ B] = Pr[A \cup B] = Pr[A] + Pr[B] - Pr[A\ \ AND \ \ B] = Pr[A] + Pr[B] - Pr[A \cap B]\] Visualizing this rule makes it more intuitive:
The general addition rule applies whether two sets are mutually exclusive or not. For two mutually exclusive events, since \(Pr[A \cap B] = 0\), the probability that either one occurs is simply the sum of their individual probabilities:
\[Pr[A \ \ OR\ \ B] = Pr[A \cup B] = Pr[A] + Pr[B]\]
\(\Rightarrow\) Q: How does the addition rule relate to finding the total probability of different outcomes in probability distributions?
For example, for the discrete distribution of face values for 2 fair dice: