updated to match content of readings
This commit is contained in:
@@ -61,13 +61,9 @@ chisq.test(table(iris$Species, iris$Sepal.Width > 3))
|
|||||||
|
|
||||||
The incredibly low p-value means that it is very unlikely that these came from the same distribution and that sepal width differs by species.
|
The incredibly low p-value means that it is very unlikely that these came from the same distribution and that sepal width differs by species.
|
||||||
|
|
||||||
|
## BONUS: Using simulation to test hypotheses and calculate "exact" p-values
|
||||||
|
|
||||||
|
When the assumptions of $\chi^2$ tests aren't met, we can use simulation to approximate how likely a given result is. The material here comes from the final two sections of Chapter 6 of the *OpenIntro* textbook. The book uses the example of a medical practitioner who has 3 complications out of 62 procedures, while the typical rate is 10%.
|
||||||
## Using Simulation
|
|
||||||
|
|
||||||
When the assumptions of Chi-squared tests aren't met, we can use simulation to approximate how likely a given result is.
|
|
||||||
|
|
||||||
The book uses the example of a medical practitioner who has 3 complications out of 62 procedures, while the typical rate is 10%.
|
|
||||||
|
|
||||||
The null hypothesis is that this practitioner's true rate is also 10%, so we're trying to figure out how rare it would be to have 3 or fewer complications, if the true rate is 10%.
|
The null hypothesis is that this practitioner's true rate is also 10%, so we're trying to figure out how rare it would be to have 3 or fewer complications, if the true rate is 10%.
|
||||||
|
|
||||||
@@ -84,7 +80,6 @@ simulation <- function(rate = .1, n = 62){
|
|||||||
}
|
}
|
||||||
|
|
||||||
# The replicate function runs a function many times
|
# The replicate function runs a function many times
|
||||||
|
|
||||||
simulated_complications <- replicate(5000, simulation())
|
simulated_complications <- replicate(5000, simulation())
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -92,12 +87,10 @@ simulated_complications <- replicate(5000, simulation())
|
|||||||
We can look at our simulated complications
|
We can look at our simulated complications
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
|
||||||
hist(simulated_complications)
|
hist(simulated_complications)
|
||||||
```
|
```
|
||||||
|
|
||||||
And determine how many of them are as extreme or more extreme than the value we saw. This is the p-value.
|
And determine how many of them are as extreme or more extreme than the value we saw. This is the "exact" p-value.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
|
||||||
sum(simulated_complications <= 3)/length(simulated_complications)
|
sum(simulated_complications <= 3)/length(simulated_complications)
|
||||||
|
|||||||
Reference in New Issue
Block a user