CLT-based inference: hypothesis testing

Song of the Day

Galveston - Glen Campbell

Main ideas

Understand the CLT and how to use it
Perform statistical hypothesis tests using base R and infer

Coming Up

Getting help
Project proposal due Monday 3-22
Homework #04 goes live on Thursday 3-18
Lab #07 on Friday 3-19
Submit anonymous feedback by 3-19

Notes

library(tidyverse)
library(infer)

In the examples and practice sections, we’ll work with a subset of data from the General Social Survey.

gss_2010 <- read_csv("data/gss_2010.csv")

Notes

Recall the hypothesis testing framework:

Start with two hypotheses about the population: the null hypothesis and the alternative hypothesis.
Choose a (representative) sample, collect data, and analyze the data.
Figure out how likely it is to see data like what we observed or something more extreme, assuming the null hypothesis is true.
If our data is extremely unlikely assuming the null hypothesis is true, reject the null and deem the alternative worthy of further study. Otherwise, fail to reject the null.

To do step 3, we’ll need to compute probabilities from the t-distribution or the normal distribution. On Tuesday, we computed quantiles with qt() and qnorm(). To compute probabilities we’ll use pt() and pnorm().

Let’s first see if we can understand how these functions work. To understand pnorm(), we’ll try providing various inputs and seeing what the result is.

pnorm(q = 1.645)

## [1] 0.9500151

pnorm(q = 2.5)

## [1] 0.9937903

pnorm(q = -1.5)

## [1] 0.0668072

The function pnorm() gives the distribution function, the lower tail probability for a specified value on the x-axis. For some \(q\), returns \(P(Z \leq q)\), where \(Z \sim N(0, 1)\).

Example: hypothesis test for \(\mu\)

We’ll work with the same data as last time.

The GSS asks “After an average work day, about how many hours do you have to relax or pursue activities that you enjoy?”. A past census study found that the mean hours was 3.6. Perform a hypothesis test to see if this number has increased.

\[H_o: \mu = 3.6\]

\[H_a: \mu > 3.6\]

First, we’ll check out our sample data and compute some summary statistics.

hrs_relax_stats <- gss_2010 %>% 
  filter(!is.na(hrsrelax)) %>%
  summarise(x_bar = mean(hrsrelax), 
            s     = sd(hrsrelax), 
            n     = n())

hrs_relax_stats

## # A tibble: 1 x 3
##   x_bar     s     n
##   <dbl> <dbl> <int>
## 1  3.68  2.63  1154

Direct calculation via formula

Let’s grab these three statistics as vectors.

n <- hrs_relax_stats$n
x_bar <- hrs_relax_stats$x_bar
s <- hrs_relax_stats$s
mu_0 <- 3.6

Next, we need to compute our test statistic and the corresponding p-value.

test_stat <- (x_bar - mu_0) / (s / sqrt(n))
test_stat

## [1] 1.036601

The p-value is the probability of getting a test statistic value as or more extreme as test_stat assuming the null hypothesis is true.

p_value <- 1 - pt(test_stat, df = n - 1)
p_value

## [1] 0.1500696

We obtain a \(p\)-value of 0.15, so we fail to reject the null hypothesis at \(\alpha = 0.05\). We do not have sufficient evidence to suggest that the true mean number of hours American adults spend relaxing after work is greater than 3.6.

Infer

The infer package has a function to do these calculations in one step. Function t_test() is a tidier version of the built-in R function t.test().

t_test(gss_2010, response = hrsrelax, mu = 3.6, alternative = "greater",
       conf_int = FALSE)

## # A tibble: 1 x 4
##   statistic  t_df p_value alternative
##       <dbl> <dbl>   <dbl> <chr>      
## 1      1.04  1153   0.150 greater

Practice

Redo the above analysis, but perform the test to see if this number has changed. Conduct your test at the \(\alpha = 0.10\) significance level. Also, compute a 90% confidence interval. What do you notice?

\[H_o: \mu = 3.6\]

\[H_a: \mu \neq 3.6\]

t_test(gss_2010, response = hrsrelax, mu = 3.6, conf_level = .90)

## # A tibble: 1 x 6
##   statistic  t_df p_value alternative lower_ci upper_ci
##       <dbl> <dbl>   <dbl> <chr>          <dbl>    <dbl>
## 1      1.04  1153   0.300 two.sided       3.55     3.81

We obtain a \(p\)-value of 0.300, so we fail to reject the null hypothesis at \(\alpha = 0.10\). We do not have sufficient evidence to suggest that the true mean time Americans spend relaxing after work differs from 3.6.

Our 90% confidence interval is (3.55, 3.81). We are 90% confident that the true mean time Americans spend relaxing after work is between 3.55 and 3.81.

Confidence intervals and hypothesis tests

Confidence intervals provide the same information as a hypothesis test using the same data. If a hypothesized parameter value is inside of a confidence interval we fail to reject the null hypothesis. If a hypothesized parameter value is outside of a confidence interval, we reject the null hypothesis.

More generally for a test using the hypotheses \(Ho: \mu=\mu_0\) and \(Ha: \mu \neq \mu_0\) there is a one-to-one correspondence between

reject the null hypothesis using significance level \(\alpha\), and
the null hypothesized value \(\mu_0\) falls outside of our \((1 - \alpha)%\) confidence interval.

So a confidence interval tells us everything a hypothesis test does and a little bit more - in addition to knowing what parameter values we reject / fail to reject we also get a range of plausible values for the population parameter.

Example: hypothesis test for \(p\)

The GSS asks “Are you better off today than you were four years ago?”. Use a CLT-based approach to test if that proportion has decreased from its level four years ago of 0.33.

\[H_o: p = 0.33\]

\[H_a: p < 0.33\]

gss_2010 %>% 
  count(better) %>% 
  mutate(p_hat = n / sum(n)) %>% 
  filter(better == 1) %>% 
  pull(p_hat)

## [1] 0.3175147

First, we’ll check the success-failure condition.

gss_2010 %>% 
  count(better)

## # A tibble: 2 x 2
##   better     n
## *  <dbl> <int>
## 1      0  1395
## 2      1   649

We’re also assuming these observations are independent.

Use infer to do our test.

gss_2010 %>% 
  mutate(better = ifelse(better == 1, "better", "worse")) %>% 
  prop_test(response = better, success = "better", conf_int = FALSE,
            alternative = "less", z = TRUE, p = 0.33)

## # A tibble: 1 x 3
##   statistic p_value alternative
##       <dbl>   <dbl> <chr>      
## 1     -1.20   0.115 less

We obtain a \(p\)-value of 0.115, so we fail to reject the null hypothesis at \(\alpha = 0.05\). We do not have sufficient evidence to suggest that the true proportion of Americans who reported they are better off than four years ago is less than 0.33.

Practice

Redo the above analysis using base R functions and pnorm().

p_hat <- gss_2010 %>% 
  count(better) %>% 
  mutate(p_hat = n / sum(n)) %>% 
  filter(better == 1) %>% 
  pull(p_hat)

n <- nrow(gss_2010)

p_0 <- 0.33

test_stat <- (p_hat - p_0) / sqrt(p_0 * (1 - p_0) / n)
test_stat

## [1] -1.200455

p_value <- pnorm(test_stat)
p_value

## [1] 0.1149814

While we aren’t able to cover inference for every parameter, you now have the tools to conduct inference for other parameters such as the difference in means, the difference in proportions, testing if variables are independent or not, etc. Although the test statistics will differ, the general framework and concepts remain the same.

In doing inference for parameters outside of what we covered, take a look at the infer examples: https://infer.netlify.app/articles/observed_stat_examples.html. A simulation-based approach is a good strategy if you don’t know the underlying theoretical distribution. Keep this in mind as you think about research questions and explore data for your project.

References

“Infer - Tidy Statistical Inference”. Infer.Netlify.App, 2021, https://infer.netlify.app/index.html.

CLT-based inference: hypothesis testing

3-18-2021

Song of the Day

Main ideas

Coming Up

Notes

Notes

Example: hypothesis test for \(\mu\)

Direct calculation via formula

Infer

Practice

Confidence intervals and hypothesis tests

Example: hypothesis test for \(p\)

Practice

References