Exercises

Packages

library(tidyverse)
library(infer)

Data

“This data set represents thousands of loans made through the Lending Club platform, which is a platform that allows individuals to lend to other individuals. Of course, not all loans are created equal. Someone who is essentially a sure bet to pay back a loan will have an easier time getting a loan with a low interest rate than someone who appears to be riskier. And for people who are very risky? They may not even get a loan offer, or they may not have accepted the loan offer due to a high interest rate. It is important to keep that last part in mind, since this data set only represents loans actually made, i.e. do not mistake this data for loan applications!”

More information about the data can be found at https://www.openintro.org/data/index.php?data=loans_full_schema

loans <- read_rds("data/lending_club_loans.rds")

Most of the variables are self explanatory and we’ll only need to work with a subset of them. When doing inference procedures, select() the variables of interest to make the data more manageable.

General instructions

Write all R code according to the style guidelines discussed in class. Be especially careful about staying within the 80 character limit.
Use tidyverse and/or infer functions when applicable. However, you may use the formulas directly rather than t_test() and prop_test().
All inference should be done using the CLT-based approach.

Data-focused exercises

Before doing any inference, what assumptions must you make about the data? What specifically must you check prior to doing inference for the population proportion?
Create a 95% confidence interval for the average percentage of credit used by Arizona residents with loans through Lending Club. Give an interpretation of your result within the context of the data. Note the percentage of credit used is credit utilized divided by credit limit.
It is known that 35% of individuals that have a loan through a traditional bank rent their residence. Does Lending Club have a higher proportion of renters in its loan pool? Perform an appropriate statistical hypothesis test. Write the hypotheses, report the test statistic and p-value, and provide an answer to the research question in the context of the data. Don’t forget to specify your significance level prior to starting the analysis.
For each loan grade A-D, create a 99% confidence interval for the mean interest rate. Comment on what you observe.

Conceptual exercises

Suppose that as \(n\) increases \(\hat{p}\) remains constant. If you re-run your analysis and use the same confidence level but have a sample that is four times as large, how does this affect the width of your interval? Give a numeric answer relative to the width of the original interval.
Assume all population parameters are unknown. In a general hypothesis testing framework, what is the largest observed test statistic value, \(\frac{\bar{x} - \mu_0}{s / \sqrt{n}}\), we could obtain and still fail to reject the null hypothesis at the \(\alpha = 0.01\) significance level when \(H_A: \mu > \mu_0\) and \(n=39\). Here \(\mu_0\) represents the value of \(\mu\) under the null hypothesis.
If you reject the null hypothesis at the \(\alpha = 0.02\) significance level, then you will also reject the null hypothesis at the \(\alpha = 0.01\) significance level. Explain in detail, or with an example, why this claim is true or false.

Submission

Upload your team’s PDF to Gradescope. Include every team member’s name in the Gradescope submission and identify which problems are on each in Gradescope. Associate the “Overall” section with the first page of your PDF.

Include all team members’ names with the team name in the author portion of the YAML header.

You must have at least three meaningful commits.

There should only be one submission per team on Gradescope.

References

“Data Sets”. Openintro.Org, 2021, https://www.openintro.org/data/index.php?data=loans_full_schema. Accessed 13 Mar 2021.

“Infer - Tidy Statistical Inference”. Infer.Netlify.App, 2021, https://infer.netlify.app/index.html.

Lab #07: CLT-based inference

due Sun, Mar 21 11:59 PM