Simulation-based inference: hypothesis testing

Song of the Day

God Only Knows -Beach Boys

Main ideas

Understand the hypothesis testing framework
Understand the statistical conclusions that can and cannot be made from a hypothesis test
Use functions from infer to carry-out a simulation-based hypothesis test for the population mean and population proportion

Coming Up

Lab #06 on Friday 3-12.
Readings for Tuesday 3-16.

DataFest

Consider applying to DataFest.

DataFest will take place April 9 - 11 virtually. It is a data analysis competition where teams of up to five students analyze a large, complex, and surprise dataset over a weekend. Your job is to represent your school by finding and communicating insights into these data. In addition to being a great experience for honing your data analysis skills, there will be get togethers, a guest speaker, and other fun events throughout the weekend.

Packages

library(tidyverse)
library(infer)

Data

We’ll continue to work with the sample of Zoom screen-time data we obtained. To make things easier with the infer functions, we’ll create a tibble with time as a single variable.

zoom <- tibble(
  time = c(299, 192, 196, 218, 194, 250, 183, 218, 207, 
           209, 191, 189, 244, 233, 208, 216, 178, 209, 
           201, 173, 186, 209, 188, 231, 195, 200, 190, 
           199, 226, 238)
)

zoom

## # A tibble: 30 x 1
##     time
##    <dbl>
##  1   299
##  2   192
##  3   196
##  4   218
##  5   194
##  6   250
##  7   183
##  8   218
##  9   207
## 10   209
## # … with 20 more rows

Set seed

To obtain reproducible results, set the seed for the random number generation.

set.seed(1421)

Notes

Recall our hypothesis testing framework:

Start with two hypotheses about the population: the null hypothesis and the alternative hypothesis.
Choose a (representative) sample, collect data, and analyze the data.
Figure out how likely it is to see data like what we observed, assuming the null hypothesis is true.
If our data would have been extremely unlikely if the null claim were true, then we reject the null claim and deem the alternative claim worthy of further study. Otherwise, we cannot reject the null claim.

Example: testing population mean: \(\mu\)

We’ve already done items 1 and 2, where

\[H_0: \mu = 200\] \[H_1: \mu \neq 200\]

For this study, let \(\alpha = 0.05\).

To tackle items 3 and 4, we’ll use a simulation-based approach with functions from infer.

Simulate the null distribution

Recall that there is variability in the distribution of the sample mean. We need to account for this in our statistical study. Just as we did for confidence intervals, we’ll use a bootstrap procedure here.

specify() the variable of interest
set the null hypothesis with hypothesize()
generate() the bootstrap samples
calculate() the statistic of interest

null_dist <- zoom %>% 
  specify(response = time) %>% 
  hypothesize(null = "point", mu = 200) %>% 
  generate(reps = 10000, type = "bootstrap") %>% 
  calculate(stat = "mean")

Visualize the null distribution

visualize(null_dist) +
  labs(x = "Sample means", y = "Count", title = "Simulated null distribution")

Compute p-value

Next, we calculate the probability of getting sample data like ours, or something more extreme, assuming \(H_0\) is true.

Our observed sample mean is 209 minutes.

x_bar <- zoom %>% 
  summarize(mean_time = mean(time))

x_bar

## # A tibble: 1 x 1
##   mean_time
##       <dbl>
## 1       209

visualize(null_dist) +
  shade_p_value(obs_stat = x_bar, direction = "two-sided") +
  labs(x = "Sample mean", y = "Count")

In the context of this simulation-based approach, the p-value is the proportion of observations shaded light-red. To compute this, infer provides a convenient function – get_p_value().

null_dist %>% 
  get_p_value(obs_stat = x_bar, direction = "two-sided")

## # A tibble: 1 x 1
##   p_value
##     <dbl>
## 1  0.0718

Conclusion

Given the calculated p-value and the specified \(\alpha\), what conclusion do you make?

We obtain a \(p\)-value of 0.0718, so we fail to reject the null hypothesis at \(\alpha = 0.05\). We do not have sufficient evidence to suggest that the true mean time spent on Zoom for North Carolina eighth graders is different than 200 minutes.

Practice 1

Recall our original example: The state of North Carolina claims that students in 8th grade are spending, on average, 200 minutes on Zoom each day. Suppose in reporting this the incorrect metric was specified, it should have been the median time. Use your sample data to investigate if the median Zoom screen-time is more than 200 minutes.

Write out the hypotheses for this statistical test. Let \(M\) represent the population median. Let \(\alpha = 0.05\).

\[H_0: M = 200\] \[H_1: M > 200\]

Generate the null distribution.

null_dist_med <- zoom %>% 
  specify(response = time) %>% 
  hypothesize(null = "point", med = 200) %>% 
  generate(reps = 10000, type = "bootstrap") %>% 
  calculate(stat = "median")

Visualize the null distribution, observed statistic, and shaded region corresponding to the p-value.

med <- zoom %>% summarize(med_time = median(time))

visualize(null_dist_med) +
  shade_p_value(obs_stat = med, direction = "greater") +
  labs(x = "Medians", y = "Count")

Interpret the results of your test in the context of the data.

null_dist_med %>% 
  get_p_value(obs_stat = med, direction = "greater")

## # A tibble: 1 x 1
##   p_value
##     <dbl>
## 1   0.313

We obtain a \(p\)-value of 0.313, so we fail to reject the null hypothesis at \(\alpha = 0.05\). We do not have sufficient evidence to suggest that the true median time spent on Zoom for North Carolina eighth graders is greater than 200 minutes.

Example: testing population proportion - \(p\)

People providing an organ for donation sometimes seek the help of a special medical consultant. These consultants assist the patient in all aspects of the surgery, with the goal of reducing the possibility of complications during the medical procedure and recovery. Patients might choose a consultant based in part on the historical complication rate of the consultant’s clients.

One consultant tried to attract patients by noting that the average complication rate for liver donor surgeries in the US is about 10%, but her clients have only had 3 complications in the 62 liver donor surgeries she has facilitated. She claims this is strong evidence that her work meaningfully contributes to reducing complications (and therefore she should be hired!).

Write out the hypotheses for this statistical test. Let \(p\) represent the population proportion of complications from liver donor surgeries. State your significance level.

\[H_0: p = 0.10\] \[H_A: p < 0.10\]

Let \(\alpha = 0.01\).
Generate the null distribution.

liver <- tibble(
  surgery_result = rep(c("complication", "no complication"), times = c(3, 59))
)

liver

## # A tibble: 62 x 1
##    surgery_result 
##    <chr>          
##  1 complication   
##  2 complication   
##  3 complication   
##  4 no complication
##  5 no complication
##  6 no complication
##  7 no complication
##  8 no complication
##  9 no complication
## 10 no complication
## # … with 52 more rows

null_dist_phat <- liver %>% 
  specify(response = surgery_result, success = "complication") %>% 
  hypothesise(null = "point", p = 0.10) %>% 
  generate(reps = 1000, type = "simulate") %>% 
  calculate(stat = "prop")

Visualize the null distribution, observed statistic, and shaded region corresponding to the p-value.

p_hat <- liver %>% 
  count(surgery_result) %>% 
  mutate(prop = n / sum(n)) %>% 
  filter(surgery_result == "complication") %>% 
  select(prop)

visualise(null_dist_phat) +
  shade_p_value(obs_stat = p_hat, direction = "less") +
  labs(x = "Sample proportion", y = "Count")

Interpret the results of your test in the context of the data.

null_dist_phat %>% 
  get_p_value(obs_stat = p_hat, direction = "less")

## # A tibble: 1 x 1
##   p_value
##     <dbl>
## 1   0.117

We obtain a \(p\)-value of 0.117, so we fail to reject the null hypothesis at \(\alpha = 0.05\). We do not have sufficient evidence to suggest that the true proportion of complications is less than 0.10.

Practice 2

Consider the mice data from the previous notes.

mice <- read_table("http://users.stat.ufl.edu/~winner/data/micerad.dat",
                   col_names = FALSE) %>% 
  rename(dose = X1, treat = X2, died = X3)

Previous studies have shown that 50% of mice die when subject to radiation despite being on a treatment. Does the Streptomycin Therapy (treatment in this study) produce a survival rate better than 50%? Perform a statistical hypothesis test to investigate. State the hypotheses, significance level, p-value, and conclusion. What is a hidden variable we are not considering when conducting this test?

\[H_0: p = 0.50\] \[H_A: p < 0.50\]

Let \(\alpha = 0.01\).

First, we’ll compute the observed sample proportion of mice that died while on the treatment.

p_hat <- mice %>% 
  filter(treat == 1) %>% 
  summarize(mean_prop = mean(died))

p_hat

## # A tibble: 1 x 1
##   mean_prop
##       <dbl>
## 1     0.398

mice %>% 
  filter(treat == 1) %>% 
  mutate(outcome = ifelse(died == 1, "died", "survived")) %>% 
  specify(response = outcome, success = "died") %>% 
  hypothesise(null = "point", p = 0.50) %>% 
  generate(reps = 10000, type = "simulate") %>% 
  calculate(stat = "prop") %>% 
  get_p_value(obs_stat = p_hat, direction = "less")

## # A tibble: 1 x 1
##   p_value
##     <dbl>
## 1  0.0009

Given that the p-value is less than \(\alpha\) we reject the null hypothesis. That is, we reject the claim that the survival rate of mice exposed to radiation while on the Streptomycin Therapy is 50%.

A hidden variable we haven’t accounted for and assumed to be constant was the radiation dose. In our example, this is okay because our main focus is understanding the testing framework. However, in practice, it would be bad to not consider this in an analysis.

References

C.W. Hammond, et al. (1955). “The Effect of Streptomycin Therapy on Mice Irradiated with Fast Neutrons”, Radiation Research, Vol2,#4, pp.354-360
“Infer - Tidy Statistical Inference”. Infer.Netlify.App, 2021, https://infer.netlify.app/index.html.