A study raises groups of mice on two different diets: - Regular chow (11% fat) - High-fat chow (58% fat)
They then measure differences in weight gain, glucose intolerance, and insulin resistance.
Our scientific question is: Are the HF mice a good model for Type II diabetes?
We saw in a previous exercise that we could use the control mice to compare a sample mean with the population mean and see that there is variation between samples in estimating population parameters. We also looked at how sampling variation affects the 95% confidence intervals for the population mean.
Here we will look at the difference between the means of random samples from the control population and compare them to the mean difference between two samples, one drawn from the control population and the other drawn from the high-fat diet population.
We will use this data to determine what proportion of the time we find differences between the sample means that are at least as big as the difference observed between samples drawn from different populations.
These data come from the following paper:
The high-fat diet-fed mouse: a model for studying mechanisms and treatment of impaired glucose tolerance and Type II diabetes. MS Winzell and B Ahren, _Diabetes_ 2004; 53:S215-219
We previously found that mice in the high-fat diet group were on average around 10-12% heavier than control mice, depending on whether we were looking at males or females. Here we will just look at female mice.
# load the dataset and take a look at it
mice.pheno = read.csv("data/mice_pheno.csv", stringsAsFactors = TRUE)
# extract female bodyweights for control and high-fat mice
# Option 1: base R
#ctl = mice.pheno[mice.pheno$Diet == "chow" & mice.pheno$Sex == "F","Bodyweight"]
#trt = mice.pheno[mice.pheno$Diet == "hf" & mice.pheno$Sex == "F","Bodyweight"]
# Option 2: dplyr (use `pull` at the end to extract just an unnamed vector of weights)
ctl = filter(mice.pheno, Diet == "chow" & mice.pheno$Sex == "F") %>% select(Bodyweight) %>% pull
trt = filter(mice.pheno, Diet == "hf" & mice.pheno$Sex == "F") %>% select(Bodyweight) %>% pull
# take a look at these
head(ctl)
## [1] 27.03 24.80 27.02 28.07 23.55 22.72
head(trt)
## [1] 31.94 32.48 22.82 19.92 32.22 27.50
mean(ctl)
## [1] 23.89338
mean(trt)
## [1] 26.2689
sd(ctl)
## [1] 3.424056
sd(trt)
## [1] 5.082592
mean(trt) - mean(ctl)
## [1] 2.375517
Again, we see that the the high-fat mice are around 10% heavier than the control mice.
Most of the time, we don’t have access to the full population for an experiment, but for this example, we have a large number (200+) control and high-fat diet mice that we can use as a proxy for the parent populations.
What if we only had enough money to get 24 mice from the Jackson labs to run this experiment?
How could we tell if the observed difference between two samples from these two sets is meaningful? Later we will use some standard statistical tests to answer this question, but here let’s see what we can do just by brute force.
# take a few samples of 12 mice from the control female mice population
# and look at the means
mean(sample(ctl, 12))
## [1] 25.5075
mean(sample(ctl, 12))
## [1] 26.31667
mean(sample(ctl, 12))
## [1] 22.8225
mean(sample(ctl, 12))
## [1] 22.89167
mean(sample(ctl, 12))
## [1] 22.52333
Naturally, each time we take a different sample, we get a different mean. Since we are taking random samples, the mean is a random variable.
# set a seed
set.seed(3)
# compare the means of two random samples
ctl.sample = sample(ctl, 12)
trt.sample = sample(trt, 12)
obs.diff = mean(trt.sample) - mean(ctl.sample)
obs.diff
## [1] 2.360833
Is this sample size big enough to distinguish a true difference between the high-fat diet group and the control group? How often would we get as big a difference as this, just by chance?
This is where statistical inference can help us answer our scientific question. We need to keep in mind that these are just two random samples, and the difference in their means will vary each time we take a new sample.
This is why confidence intervals are helpful: they provide a quantitative (vs. qualitative) estimate of the true difference in means between the populations, based on the data from just a single trial.
We will compute confidence intervals for this example as a future exercise; here we will just focus on \(p\)-values.
If the difference between the ctl
and trt
samples were just due to chance, then they could easily have come from the same population.
Since in this example we have access to the entire population, we can simulate many possible outcomes of taking 12 random samples from the control population and see what proportion of the time we see a difference as large as we did above.
What are the null and alternative hypotheses for this test?
Will we perform a one-sided or two-sided test?
One-sided: we expect that high-fat diet mice should be heavier on average than control mice.
What is the test statistic?
The observed difference between the high-fat and control samples: {r} obs.diff
Let’s test this idea by taking a bunch of random samples from the control population and pretending that some of them are trt
samples. This would be like a blind placebo experiment, in which we expect to see no effect from the (supposed) “high-fat” diet.
If we do this many times, and take the difference between the means of two samples, this will give us an idea about how likely the observed difference is under the null hypothesis.
The null distribution is all possible realizations of the difference in group means under the null hypothesis. Knowing this distribution allows us to determine the proportion of the time we expect to see values of any particular magnitude.
# number of simulations
n = 10000
# initialize an empty vector
null.dist = vector("numeric",length=length(n))
# use a loop to run 10,000 simulations under the null
for (i in 1:n) { # repeat the following n times:
# take two samples of 12 mice from the control population
ctl.sample = sample(ctl, 12) # "control" sample
trt.sample = sample(ctl, 12) # "treatment" (placebo) sample
# record the mean difference between them
null.dist[i] = mean(trt.sample) - mean(ctl.sample)
}
# what's the range of the values we got?
c(min(null.dist), max(null.dist))
## [1] -4.882500 6.786667
# make a histogram
hist(null.dist, breaks=20, col="cadetblue1")
Now we can figure out how likely it is to see a value as big as the one we saw under the null hypothesis. This will provide statistical support for our observation.
# histogram of null distribution with observed diff(trt.sample,ctl.sample)
hist(null.dist, breaks=20, col="cadetblue1")
abline(v=obs.diff, col="red", lwd=2)
# sum over the total gives the expected proportion > observed difference
sum(null.dist > obs.diff) / n
## [1] 0.035
# this is the same as the mean number of times we see this happen
mean(null.dist > obs.diff)
## [1] 0.035
This is a \(p\)-value! It is the probability that an outcome from the null distribution is bigger than the difference we observed (when the null hypothesis is true).
In this case, we were able to run the simulations above because we had data from the whole population. Usually this will not be the case, and we will need to make mathematical approximations using estimators based on the data at hand without knowing the true population parameters.
We will come back to this again later when we discuss \(t\)-tests, which follow a \(t\)-distribution. The \(t\)-distribution is very similar to a \(z\)-distribution for the standard normal, but it has somewhat longer tails for small sample sizes.