Every team member should go to the course GitHub organization and locate their lab06 repository, which should have the prefix lab06. Copy the URL of the repository and clone the remote repo in RStudio.
As you work on this lab, merge conflicts may arise. Refer back to Lab #05 for how to fix them. You and your team are free to divide up the work how you think is best. However, everyone should understand all code in the lab’s final submission.
library(tidyverse)
library(infer)
In this lab, you’ll work with a couple of datasets.
The ToothGrowth
dataset can be loaded into R with data("ToothGrowth)
. It contains data on the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (OJ
) or ascorbic acid (VC
) (a form of vitamin C and coded as VC). For the purposes of this lab, we will ignore the dose
variable.
The second dataset is a subset of gss_cat
from the forcats
package. It contains categorical variables from the General Social Survey in 2014.
<- gss_cat %>%
gss_2014 filter(year == 2014)
Write all R
code according to the style guidelines discussed in class. Be especially careful about staying within the 80 character limit.
Use tidyverse
and infer
functions to perform simulation-based inference to complete the exercises. Unless specified in an exercise, generate 10,000 sample replicates.
The seed has been set in the starter Rmd file in chunk set_up
. Don’t modify it.
Use ToothGrowth
for Exercises 1 - 4.
Suppose you are interested in constructing a confidence interval for the mean length of odontoblasts in guinea pigs that received some Vitamin C. Given this description and the ToothGrowth
data, identify the population, parameter of interest, sample, sample size, and observed sample mean.
Create a 99% confidence interval for the mean length of odontoblasts in guinea pigs that received some Vitamin C. Interpret your interval in the context of the problem.
Look at the example given in infer
for creating a confidence interval when you have one numeric variable and a categorical variable with two levels. Create a 95% confidence interval for the difference in mean length of odontoblasts in guinea pigs that received Vitamin C by way of OJ
and VC
. Define the difference as OJ - VC
. Your answer should include the observed sample statistic, a visualization of the difference in means using a histogram, the 95% confidence interval, and an interpretation for that interval.
Based on your results in Exercise 3, can you conclude that orange juice is a better delivery method of vitamin C than ascorbic acid as it relates to tooth growth in guinea pigs? Why or why not?
Use gss_2014
for Exercises 5 - 7.
The 2010 census revealed that the proportion of U.S. adults who were married was 0.48. Based on the sample data in 2014, perform a hypothesis test at the \(\alpha = 0.01\) significance level to see if this value has changed. Write your hypotheses using the notation introduced in the course. Your answer should include a simulated null distribution, p-value, and written conclusion. Use 1,000 bootstrap samples.
Given your conclusion in Exercise 5, what type of error could have been made? You can refer to Thursday’s slides for help here.
Suppose the significance level in Exercise 5 was 0.10. Would your conclusion change? If so, how?
Upload your team’s PDF to Gradescope. Include every team member’s name in the Gradescope submission and identify which problems are on each in Gradescope. Associate the “Overall” section with the first page of your PDF.
Include all team members’ names with the team name in the author portion of the YAML header.
There should only be one submission per team on Gradescope.
“Infer - Tidy Statistical Inference”. Infer.Netlify.App, 2021, https://infer.netlify.app/index.html.