# 23 Lab 7: The Central Limit Theorem

### 23.0.1 Our data

The `dslabs`

package contains a dataset called `murders`

which contains data from the Federal Bureau of Investigation (FBI) about firearm murders in the 50 US states and Washington, D.C for the year 2010. A preview of the data is shown below.

```
## state abb region population total
## 1 Alabama AL South 4779736 135
## 2 Alaska AK West 710231 19
## 3 Arizona AZ West 6392017 232
## 4 Arkansas AR South 2915918 93
## 5 California CA West 37253956 1257
## 6 Colorado CO West 5029196 65
```

### 23.0.2 Your tasks

- Download and install the
`dslabs`

package and then load the`murders`

dataset. Create a new variable,`murders_per_100k`

, which gives the number of murders per 100,000 people in each state. If you have done this correctly, the first 6 rows of`murders`

should now look identical to the output below. If you donâ€™t already know how to calculate this value, a quick Google search will show you how.

```
## state abb region population total murders_per_100k
## 1 Alabama AL South 4779736 135 2.824424
## 2 Alaska AK West 710231 19 2.675186
## 3 Arizona AZ West 6392017 232 3.629527
## 4 Arkansas AR South 2915918 93 3.189390
## 5 California CA West 37253956 1257 3.374138
## 6 Colorado CO West 5029196 65 1.292453
```

The national firearm murder rate is the mean of

`murders_per_100k`

and it is reasonable to assert that this type of variable follows a Poisson distribution. Calculate the national murder rate, save this as a variable called`lambda`

and print`lambda`

as your answer to this question.Write a function called

`murder_sim()`

which produces a dataframe with`n`

rows that consists of simulated national firearm murder rates that follow the appropriate distribution.

- Hint: This function is similar to
`jury_sim()`

from the tutorial

- Using a
`seed`

integer of 10, use`murder_sim()`

to create 10,000 simulated national firearm murder rates. Save the resulting data as an object and then use it to create a histogram just like the one below.**But make sure you put the correct value for**`lambda`

in the subtitle.

- Suppose that the year is now 2011 and you read in the newspaper that the FBI predicts that for this year, the firearm murder rate will be 2.2% higher than in 2010. Using this predicted value for the 2011 firearm murder rate as a test statistic together with your simulated data, calculate the proportion of simulated murder rates in your data that are greater than or equal to this value and explain whether or not this predicted murder rate is abnormally high compared to 2010. Illustrate this by modifying the histogram you made to answer the last question.