Main ideas
- Use formulas to compute conditional probabilities from tabular data
- Compute empirical probabilities in
R
via simulation
R
via simulationlibrary(tidyverse)
library(vcd) # used for Arthritis data
data(Arthritis)
glimpse(Arthritis)
## Rows: 84
## Columns: 5
## $ ID <int> 57, 46, 77, 17, 36, 23, 75, 39, 33, 55, 30, 5, 63, 83, 66, 4…
## $ Treatment <fct> Treated, Treated, Treated, Treated, Treated, Treated, Treate…
## $ Sex <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male, Male, …
## $ Age <int> 27, 29, 30, 32, 46, 58, 59, 59, 63, 63, 64, 64, 69, 70, 23, …
## $ Improved <ord> Some, None, None, Marked, Marked, Marked, None, Marked, None…
We’ll again work with Arthritis
as we did last lecture.
Let’s look at the data in a tabular view. Don’t worry about understanding these functions, we’re only using it to better visualize our data via a table.
xtabs(~ Treatment + Improved, data = Arthritis) %>%
addmargins()
## Improved
## Treatment None Some Marked Sum
## Placebo 29 7 7 43
## Treated 13 7 21 41
## Sum 42 14 28 84
What is the probability a randomly selected patient received the placebo and had a marked improvement?
\(P(L \cap M) = 7 / 84\)
What is the probability a randomly selected patient had a marked improvement given they received the placebo?
\(P(M | L) = P(M \cap L) / P(L) = 7 / 43\)
What is the probability a randomly selected patient had no improvement given they received the treatment?
\(P(N | T) = P(N \cap T) / P(T) = 13 / 41\)
What is the probability a randomly selected patient was on the placebo given they had a marked improvement?
\(P(L | M) = P(L \cap M) / P(M) = 7/28\)
What is the probability a randomly selected patient was on the treatment given they had some or marked improvement?
\(P(T | S \cup M) = P(T \cap (S \cup M)) / P(S \cup M) = 28 / 42\)
Watch the first 38 seconds of the first video on the site here: https://brilliant.org/wiki/monty-hall-problem/.
“Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you,”Do you want to pick door No. 2?" Is it to your advantage to switch your choice?"
We will investigate the above decision of whether to switch or not to switch.
Assumptions:
The host will always open a door not picked by the contestant.
The host will always open a door which reveals a goat (i.e. not a car).
The host will always offer the contestant the chance to switch to another door.
The door behind which the car is placed is chosen at random.
The door initially chosen by the contestant is chosen at random.
doors <- c(1, 2, 3)
monty_hall <- tibble(
car_door = sample(doors, size = 10000, replace = TRUE),
my_door = sample(doors, size = 10000, replace = TRUE)
)
monty_hall
## # A tibble: 10,000 x 2
## car_door my_door
## <dbl> <dbl>
## 1 3 3
## 2 3 1
## 3 1 3
## 4 3 1
## 5 3 3
## 6 1 3
## 7 1 2
## 8 1 3
## 9 2 1
## 10 3 1
## # … with 9,990 more rows
monty_hall <- monty_hall %>%
rowwise() %>%
mutate(monty_door = if_else(car_door == my_door,
sample(doors[-my_door], size = 1),
6 - (car_door + my_door))) %>%
ungroup()
monty_hall
## # A tibble: 10,000 x 3
## car_door my_door monty_door
## <dbl> <dbl> <dbl>
## 1 3 3 2
## 2 3 1 2
## 3 1 3 2
## 4 3 1 2
## 5 3 3 2
## 6 1 3 2
## 7 1 2 3
## 8 1 3 2
## 9 2 1 3
## 10 3 1 2
## # … with 9,990 more rows
monty_hall <- monty_hall %>%
mutate(switch_win = car_door != my_door,
stay_win = car_door == my_door)
monty_hall
## # A tibble: 10,000 x 5
## car_door my_door monty_door switch_win stay_win
## <dbl> <dbl> <dbl> <lgl> <lgl>
## 1 3 3 2 FALSE TRUE
## 2 3 1 2 TRUE FALSE
## 3 1 3 2 TRUE FALSE
## 4 3 1 2 TRUE FALSE
## 5 3 3 2 FALSE TRUE
## 6 1 3 2 TRUE FALSE
## 7 1 2 3 TRUE FALSE
## 8 1 3 2 TRUE FALSE
## 9 2 1 3 TRUE FALSE
## 10 3 1 2 TRUE FALSE
## # … with 9,990 more rows
monty_hall %>%
summarise(switch_win_prob = mean(switch_win),
stay_win_prob = mean(stay_win))
## # A tibble: 1 x 2
## switch_win_prob stay_win_prob
## <dbl> <dbl>
## 1 0.668 0.332
Marilyn vos Savant received over ten thousand letters claiming her assertion (that you should switch) was incorrect. What was her response? “Yes; you should switch,” she replied. “The first door has a 1/3 chance of winning, but the second door has a 2/3 chance.”
Read more about this at https://priceonomics.com/the-time-everyone-corrected-the-worlds-smartest/
Explanation #1: Suppose you pick door #1. The three possible arrangements of goats and car are provided in the table below. What Monty reveals is underlined (in the final row he reveals one of the underlined goats in door #2 or #3).
Note if you stay with door #1 you win in one of three possible arrangements while if you switch you win in two out of three possible arrangements.
Door #1 | Door #2 | Door #3 | Switch | Stay |
---|---|---|---|---|
Goat | Goat | Car | Win | Lose |
Goat | Car | Goat | Win | Lose |
Car | Goat | Goat | Lose | Win |
Explanation #2: Consider that if you pick a goat (probability \(2/3\)) you definitely win if you switch doors, while if you pick a car (probability \(1/3\)) you definitely will not win if you switch doors.
Explanation #3: Note that before Monty opens a door, there is a \(1/3\) chance the door you picked contains a car and a \(2/3\) chance one of the other two doors contains a car. When Monty opens a door you didn’t pick and reveals a goat, there is still a \(2/3\) chance one of the doors you didn’t pick contains a car. Pick the unopened door to have a \(2/3\) probability of winning a car.
Here’s Kevin Devlin: “By opening his door, Monty is saying to the contestant ‘There are two doors you did not choose, and the probability that the prize is behind one of them is 2/3. I’ll help you by using my knowledge of where the prize is to open one of those two doors to show you that it does not hide the prize. You can now take advantage of this additional information. Your choice of door A has a chance of 1 in 3 of being the winner. I have not changed that. But by eliminating door C, I have shown you that the probability that door B hides the prize is 2 in 3.’”