High-fat Mouse Diet Study

A study raises groups of mice on two different diets: - Regular chow (11% fat) - High-fat chow (58% fat)

They then measure differences in weight gain, glucose intolerance, and insulin resistance.

Our scientific question is: Are the HF mice a good model for Type II diabetes?

We saw in a previous exercise that we could use the control mice to compare a sample mean with the population mean and see that there is variation between samples in estimating population parameters. We also looked at how sampling variation affects the 95% confidence intervals for the population mean.

Here we will look at the difference between the means of random samples from the control population and compare them to the mean difference between two samples, one drawn from the control population and the other drawn from the high-fat diet population.

We will use this data to determine what proportion of the time we find differences between the sample means that are at least as big as the difference observed between samples drawn from different populations.

I. The Dataset

These data come from the following paper:

The high-fat diet-fed mouse: a model for studying mechanisms and treatment of impaired glucose tolerance and Type II diabetes. MS Winzell and B Ahren, _Diabetes_ 2004; 53:S215-219

We previously found that mice in the high-fat diet group were on average around 10-12% heavier than control mice, depending on whether we were looking at males or females. Here we will just look at female mice.

Load the dataset and make two vectors containing just the bodyweights of females on the control or high-fat diet. Note: Last time we found that the females didn’t seem to have any missing data, so you can skip the step where we checked for NA values.

# load the dataset and take a look at it
mice.pheno = read.csv("data/mice_pheno.csv", stringsAsFactors = TRUE)

# extract female bodyweights for control and high-fat mice
# Option 1: base R
#ctl = mice.pheno[mice.pheno$Diet == "chow" & mice.pheno$Sex == "F","Bodyweight"]
#trt = mice.pheno[mice.pheno$Diet == "hf"   & mice.pheno$Sex == "F","Bodyweight"]

# Option 2: dplyr (use `pull` at the end to extract just an unnamed vector of weights)
ctl = filter(mice.pheno, Diet == "chow" & mice.pheno$Sex == "F") %>% select(Bodyweight) %>% pull
trt = filter(mice.pheno, Diet == "hf" & mice.pheno$Sex == "F") %>% select(Bodyweight) %>% pull

# take a look at these
head(ctl)
## [1] 27.03 24.80 27.02 28.07 23.55 22.72
head(trt)
## [1] 31.94 32.48 22.82 19.92 32.22 27.50

To get an idea of the expectation we would have looking at the entire dataset, compute the mean and SD of each population and find the mean difference between them.

mean(ctl)
## [1] 23.89338
mean(trt)
## [1] 26.2689
sd(ctl)
## [1] 3.424056
sd(trt)
## [1] 5.082592
mean(trt) -  mean(ctl)
## [1] 2.375517

Again, we see that the the high-fat mice are around 10% heavier than the control mice.

II. The Null Distribution

Most of the time, we don’t have access to the full population for an experiment, but for this example, we have a large number (200+) control and high-fat diet mice that we can use as a proxy for the parent populations.

Random sampling

What if we only had enough money to get 24 mice from the Jackson labs to run this experiment?

How could we tell if the observed difference between two samples from these two sets is meaningful? Later we will use some standard statistical tests to answer this question, but here let’s see what we can do just by brute force.

Let’s take a bunch of random samples from the female control population:

# take a few samples of 12 mice from the control female mice population
# and look at the means
mean(sample(ctl, 12))
## [1] 25.5075
mean(sample(ctl, 12))
## [1] 26.31667
mean(sample(ctl, 12))
## [1] 22.8225
mean(sample(ctl, 12))
## [1] 22.89167
mean(sample(ctl, 12))
## [1] 22.52333

Naturally, each time we take a different sample, we get a different mean. Since we are taking random samples, the mean is a random variable.

Now let’s take two samples of 12 mice each from the control and high-fat diet populations and compute the difference between them.

# set a seed
set.seed(3)

# compare the means of two random samples
ctl.sample = sample(ctl, 12)
trt.sample = sample(trt, 12)

obs.diff = mean(trt.sample) - mean(ctl.sample)
obs.diff

## [1] 2.360833

Is this sample size big enough to distinguish a true difference between the high-fat diet group and the control group? How often would we get as big a difference as this, just by chance?

This is where statistical inference can help us answer our scientific question. We need to keep in mind that these are just two random samples, and the difference in their means will vary each time we take a new sample.

Importantly, this means that any \(p\)-value we compute will represent just one single outcome of all possible outcomes.
Consequently, the \(p\)-value will differ each time we run this experiment independently.

This is why confidence intervals are helpful: they provide a quantitative (vs. qualitative) estimate of the true difference in means between the populations, based on the data from just a single trial.

We will compute confidence intervals for this example as a future exercise; here we will just focus on \(p\)-values.

Null hypothesis

If the difference between the ctl and trt samples were just due to chance, then they could easily have come from the same population.

Since in this example we have access to the entire population, we can simulate many possible outcomes of taking 12 random samples from the control population and see what proportion of the time we see a difference as large as we did above.

What are the null and alternative hypotheses for this test?

\(H_o\): The samples come from the same (control) population.
\(H_A\): The samples do not come from the same population.

Will we perform a one-sided or two-sided test?

One-sided: we expect that high-fat diet mice should be heavier on average than control mice.

What is the test statistic?

The observed difference between the high-fat and control samples: {r} obs.diff

Simulation: null distribution

Let’s test this idea by taking a bunch of random samples from the control population and pretending that some of them are trt samples. This would be like a blind placebo experiment, in which we expect to see no effect from the (supposed) “high-fat” diet.

If we do this many times, and take the difference between the means of two samples, this will give us an idea about how likely the observed difference is under the null hypothesis.

The null distribution is all possible realizations of the difference in group means under the null hypothesis. Knowing this distribution allows us to determine the proportion of the time we expect to see values of any particular magnitude.

Let’s simulate and record 10,000 realizations of differences in the sample means under the null hypothesis:

# number of simulations
n = 10000

# initialize an empty vector
null.dist = vector("numeric",length=length(n))

# use a loop to run 10,000 simulations under the null

for (i in 1:n) {  # repeat the following n times:
  
  # take two samples of 12 mice from the control population
  ctl.sample = sample(ctl, 12)  # "control" sample
  trt.sample = sample(ctl, 12)  # "treatment" (placebo) sample
  
  # record the mean difference between them
  null.dist[i] = mean(trt.sample) - mean(ctl.sample)
}

Let’s take a look at the distribution of our random variable:

# what's the range of the values we got?
c(min(null.dist), max(null.dist))

## [1] -4.882500  6.786667

# make a histogram
hist(null.dist, breaks=20, col="cadetblue1")

Empirical \(p\)-value

Now we can figure out how likely it is to see a value as big as the one we saw under the null hypothesis. This will provide statistical support for our observation.

Draw a histogram of the null distribution and superimpose a vertical line showing the observed difference between the high-fat sample and the control sample.

# histogram of null distribution with observed diff(trt.sample,ctl.sample)
hist(null.dist, breaks=20, col="cadetblue1")
abline(v=obs.diff, col="red", lwd=2)

Compute the proportion of times we see a difference at least as big as our observed value in the null distribution.

# sum over the total gives the expected proportion > observed difference
sum(null.dist > obs.diff) / n
## [1] 0.035
# this is the same as the mean number of times we see this happen
mean(null.dist > obs.diff)
## [1] 0.035

This is a \(p\)-value! It is the probability that an outcome from the null distribution is bigger than the difference we observed (when the null hypothesis is true).

Now we can say that 12 mice is a big enough sample, and that it’s very unlikely that we would see treated mice this much heavier by chance (only ~1% of the time).
We could report this as a significant difference … but this is just one of the many possible outcomes of two random samples! A better way to represent this would be to use confidence intervals.

Postscript

In this case, we were able to run the simulations above because we had data from the whole population. Usually this will not be the case, and we will need to make mathematical approximations using estimators based on the data at hand without knowing the true population parameters.

We will come back to this again later when we discuss \(t\)-tests, which follow a \(t\)-distribution. The \(t\)-distribution is very similar to a \(z\)-distribution for the standard normal, but it has somewhat longer tails for small sample sizes.

In-class Exercise: Empirical p-values

XDASI Fall 2021

9/27/2021