class: center, middle, inverse, title-slide # Simulation-based inference: hypothesis testing --- class: inverse, center, middle # Recall --- ## Terminology .vocab[Population]: a group of individuals or objects we are interested in studying .vocab[Parameter]: a numerical quantity derived from the population (almost always unknown) .vocab[Sample]: a subset of our population of interest .vocab[Statistic]: a numerical quantity derived from a sample .tiny[ | Quantity | Parameter | Statistic | |--------------------|------------|-------------| | Mean | `\(\mu\)` | `\(\bar{x}\)` | | Variance | `\(\sigma^2\)` | `\(s^2\)` | | Standard deviation | `\(\sigma\)` | `\(s\)` | | Median | `\(M\)` | `\(\tilde{x}\)` | | Proportion | `\(p\)` | `\(\hat{p}\)` | ] --- ## Statistical inference .vocab[Statistical inference] is the process of using sample data to make conclusions about the underlying population the sample came from. - .vocab[Estimation]: estimating an unknown parameter based on values from the sample at hand - .vocab[Testing]: evaluating whether our observed sample provides evidence for or against some claim about the population <br/> We will now move to testing hypotheses. --- class: inverse, center, middle # Testing --- ## How can we answer research questions using statistics? .question[ **Statistical hypothesis testing** is the procedure that assesses evidence provided by the data in favor of or against some claim about the population (often about a population parameter or potential associations). ] -- <br/> Example: The state of North Carolina claims that students in 8th grade are spending, on average, 200 minutes on Zoom each day. **What do you make of this statement?** **How would you evaluate the veracity of the claim?** --- ## The hypothesis testing framework 1. Start with two hypotheses about the population: the null hypothesis and the alternative hypothesis. 2. Choose a (representative) sample, collect data, and analyze the data. 3. Figure out how likely it is to see data like what we observed or something more extreme, **assuming** the null hypothesis is true. 4. If our data would have been extremely unlikely if the null claim were true, then we reject it and deem the alternative claim worthy of further study. Otherwise, we cannot reject the null claim. --- ## Two competing hypotheses The .vocab[null hypothesis] (often denoted `\(H_0\)`) states that "nothing unusual is happening" or "there is no relationship," etc. The .vocab[alternative hypothesis] (often denoted `\(H_1\)` or `\(H_A\)`) states the opposite: that there is some sort of relationship (usually this is what we want to check or really think is happening). .question[ In statistical hypothesis testing we first assume that the null hypothesis is true and then see whether we reject or fail to reject the null hypothesis. ] --- ## 1. Defining the hypotheses The null and alternative hypotheses are defined for **parameters,** not statistics. What will our null and alternative hypotheses be for this example? -- - `\(H_0\)`: the true mean time spent on Zoom per day for 8th grade students is 200 minutes - `\(H_1\)`: the true mean time spent on Zoom per day for 8th grade students is not 200 minutes Expressed in symbols: - `\(H_0: \mu = 200\)` - `\(H_1: \mu \neq 200\)`, where `\(\mu\)` is the true population mean time spent on Zoom per day by 8th grade North Carolina students. --- ## 2. Collecting and summarizing data With these two hypotheses, we now take our sample and summarize the data. ```r zoom_time <- c(299, 192, 196, 218, 194, 250, 183, 218, 207, 209, 191, 189, 244, 233, 208, 216, 178, 209, 201, 173, 186, 209, 188, 231, 195, 200, 190, 199, 226, 238) ``` ```r mean(zoom_time) ``` ``` #> [1] 209 ``` The choice of summary statistic calculated depends on the type of data. In our example, we use the sample mean: `\(\bar{x} = 209\)`. -- Do you think this is enough evidence to conclude that the mean time is not 200 minutes? --- ## 3. Assessing the evidence observed Next, we calculate the probability of getting data like ours, *or more extreme*, if `\(H_0\)` were in fact actually true. This is a conditional probability: > Given that `\(H_0\)` is true (i.e., if `\(\mu\)` were *actually* 200), what would > be the probability of observing `\(\bar{x} = 209\)` or something more extreme?" .question[ This probability is known as the **p-value**. ] --- ## 4. Making a conclusion We reject the null hypothesis if this conditional probability is small enough. If it is very unlikely to observe our data (or something more extreme) if `\(H_0\)` is true, then that gives us enough evidence to reject `\(H_0\)`. -- What is "small enough"? - We often consider a numeric cutpoint (the .vocab[significance level]) defined *prior* to conducting the analysis. - Many analyses use `\(\alpha = 0.05\)`. This means that if `\(H_0\)` were in fact true, we would expect to make the wrong decision only 5% of the time. --- ## What can we conclude? Case 1: `\(\mbox{p-value} \ge \alpha\)`: If the p-value is `\(\alpha\)` or greater, we say the results are not statistically significant and we .vocab[fail to reject] `\(H_0\)`. Importantly, **we never "accept" the null hypothesis** -- we performed the analysis assuming that `\(H_0\)` was true to begin with and assessed the probability of seeing our observed data or more extreme under this assumption. -- Case 2: `\(\mbox{p-value} < \alpha\)` If the p-value is less than `\(\alpha\)`, we say the results are .vocab[statistically significant]. In this case, we would make the decision to .vocab[reject the null hypothesis]. Similarly, **we never "accept" the alternative hypothesis**. --- ## Ok, so what **isn't** a p-value? > *"A p-value of 0.05 means the null hypothesis has a probability of only 5% of* > *being true"* > *"A p-value of 0.05 means there is a 95% chance or greater that the null* > *hypothesis is incorrect"* -- # <center><span style="color:red">NO</span></center> p-values do **not** provide information on the probability that the null hypothesis is true given our observed data. --- ## Ok, so what **isn't** a p-value? Again, a p-value is calculated *assuming* that `\(H_0\)` is true. It cannot be used to tell us how likely that assumption is correct. When we fail to reject the null hypothesis, we are stating that there is **insufficient evidence** to assert that it is false. This could be because... - ... `\(H_0\)` actually *is* true! - ... `\(H_0\)` is false, but we got unlucky and happened to get a sample that didn't give us enough reason to say that `\(H_0\)` was false Even more bad news, hypothesis testing does NOT give us the tools to determine which one of the two scenarios occurred. --- ## What can go wrong? Suppose we test a certain null hypothesis, which can be either true or false (we never know for sure!). We make one of two decisions given our data: either reject or fail to reject `\(H_0\)`. -- We have the following four scenarios: | Decision | `\(H_0\)` is true | `\(H_0\)` is false | |----------------------|------------------|------------------| | Fail to reject `\(H_0\)` | Correct decision | *Type II Error* | | Reject `\(H_0\)` | *Type I Error* | Correct decision | It is important to weigh the consequences of making each type of error. In fact, `\(\alpha\)` is precisely the probability of making a Type I error. We will talk about this (and the associated probability of making a Type II error) in future lectures. --- ## Let's conduct some hypothesis tests Click the link below to create the repository for lecture notes #14. - [https://classroom.github.com/a/xgpuf5vR](https://classroom.github.com/a/xgpuf5vR)