Every team member should go to the course GitHub organization and locate their lab07 repository, which should have the prefix lab07. Copy the URL of the repository and clone the remote repo in RStudio.
As you work on this lab, merge conflicts may arise. Refer back to Lab #05 for how to fix them. You and your team are free to divide up the work how you think is best. However, everyone should understand all code in the lab’s final submission.
library(tidyverse)
library(infer)
“This data set represents thousands of loans made through the Lending Club platform, which is a platform that allows individuals to lend to other individuals. Of course, not all loans are created equal. Someone who is essentially a sure bet to pay back a loan will have an easier time getting a loan with a low interest rate than someone who appears to be riskier. And for people who are very risky? They may not even get a loan offer, or they may not have accepted the loan offer due to a high interest rate. It is important to keep that last part in mind, since this data set only represents loans actually made, i.e. do not mistake this data for loan applications!”
More information about the data can be found at https://www.openintro.org/data/index.php?data=loans_full_schema
<- read_rds("data/lending_club_loans.rds") loans
Most of the variables are self explanatory and we’ll only need to work with a subset of them. When doing inference procedures, select()
the variables of interest to make the data more manageable.
tidyverse
and/or infer
functions when applicable. However, you may use the formulas directly rather than t_test()
and prop_test()
.Data-focused exercises
Before doing any inference, what assumptions must you make about the data? What specifically must you check prior to doing inference for the population proportion?
Create a 95% confidence interval for the average percentage of credit used by Arizona residents with loans through Lending Club. Give an interpretation of your result within the context of the data. Note the percentage of credit used is credit utilized divided by credit limit.
It is known that 35% of individuals that have a loan through a traditional bank rent their residence. Does Lending Club have a higher proportion of renters in its loan pool? Perform an appropriate statistical hypothesis test. Write the hypotheses, report the test statistic and p-value, and provide an answer to the research question in the context of the data. Don’t forget to specify your significance level prior to starting the analysis.
For each loan grade A-D, create a 99% confidence interval for the mean interest rate. Comment on what you observe.
Conceptual exercises
Suppose that as \(n\) increases \(\hat{p}\) remains constant. If you re-run your analysis and use the same confidence level but have a sample that is four times as large, how does this affect the width of your interval? Give a numeric answer relative to the width of the original interval.
Assume all population parameters are unknown. In a general hypothesis testing framework, what is the largest observed test statistic value, \(\frac{\bar{x} - \mu_0}{s / \sqrt{n}}\), we could obtain and still fail to reject the null hypothesis at the \(\alpha = 0.01\) significance level when \(H_A: \mu > \mu_0\) and \(n=39\). Here \(\mu_0\) represents the value of \(\mu\) under the null hypothesis.
If you reject the null hypothesis at the \(\alpha = 0.02\) significance level, then you will also reject the null hypothesis at the \(\alpha = 0.01\) significance level. Explain in detail, or with an example, why this claim is true or false.
Upload your team’s PDF to Gradescope. Include every team member’s name in the Gradescope submission and identify which problems are on each in Gradescope. Associate the “Overall” section with the first page of your PDF.
Include all team members’ names with the team name in the author portion of the YAML header.
You must have at least three meaningful commits.
There should only be one submission per team on Gradescope.
“Data Sets”. Openintro.Org, 2021, https://www.openintro.org/data/index.php?data=loans_full_schema. Accessed 13 Mar 2021.
“Infer - Tidy Statistical Inference”. Infer.Netlify.App, 2021, https://infer.netlify.app/index.html.