Supplemental Material

Everything in here is meant to help you succeed at the exams in the course. There’s no due date attached to any of it because you aren’t meant to turn any of it in.

If you do want feedback on the work you’ve done feel free to send your work to me as a picture/document, just don’t expect the $\approx$ 72 hour response time.

The format of this material will mostly be large form word problems with real or simulated data. If this is your last/only math class during your college career this next concept isn’t super helpful to understand, but anyone with more than one math class in their future should pay attention.

Mathematics/Computational courses operate off of a system of exercises and problems

These are gross oversimplifications, many proper mathematicians would argue these lack rigor.

Axioms are logical proofs of concepts, operations, and theorems. They are meant to leverage common sense and intuition. Meaning they have to have a core that is understandable in any context.
Algorithms are a system of steps for solving a problem/yielding a result. Think of them as mathematical/computational recipes. If you follow them exactly you should yield the appropriate result.
Exercises are simplified, in-class examples led by the instructor. Their purpose is to introduce the axioms and algorithms that make up the foundation of the course topics.
Problems are higher complexity, generally difficult, student directed puzzles. The axioms will still apply, the algorithms may not. The student is expected to flex their critical thinking and logic skills to either piece together which axioms and algorithms resolve the puzzle, or construct a new algorithm out of pieces of previously learned ones. They are never unsolvable, but you should have never seen them before.

Household Income

The Census Bureau keeps track of a larger variety of interesting U.S. Household/citizen metrics and an even larger variety of uninteresting metrics.

In order to calculate metrics from Census data, it’s a common practice to split states into counties, counties into districts, residents of the districts into groups based on similarities, and randomly select from those groups in even batches. The result is a sample that is ideally homogeneous.

Below is a table with a sample of $n=50$ from the original sample size of 320. The sample was taken by evenly dividing the data by year and selecting 5 rows of data at random from each year:

Income (in 1000s of USD)	Year	Share Proportion	State
53	2013	0.0476919	Kansas
6232	2013	0.0539027	Kansas
5410	2013	0.0467958	Kansas
8380	2013	0.0724881	Kansas
6215	2013	0.0537543	Kansas
89	2014	0.0796244	Kansas
57	2014	0.0509154	Kansas
11540	2014	0.0993029	Kansas
5714	2014	0.0491657	Kansas
57	2014	0.0513577	Kansas
119	2015	0.1068316	Kansas
57	2015	0.0513888	Kansas
8421	2015	0.0720238	Kansas
61	2015	0.0551635	Kansas
91	2015	0.0817416	Kansas
9200	2016	0.0781513	Kansas
60	2016	0.0536152	Kansas
6029	2016	0.0512156	Kansas
45	2016	0.0398922	Kansas
5404	2016	0.0459074	Kansas
4725	2017	0.0397657	Kansas
6931	2017	0.0583302	Kansas
58	2017	0.0521292	Kansas
5619	2017	0.0472877	Kansas
49	2017	0.0438035	Kansas
48	2018	0.0429701	Kansas
6932	2018	0.0578984	Kansas
5558	2018	0.0464243	Kansas
59	2018	0.0524877	Kansas
5507	2018	0.0459955	Kansas
9090	2019	0.0752758	Kansas
63	2019	0.0553547	Kansas
48	2019	0.0427709	Kansas
15375	2019	0.1273196	Kansas
8174	2019	0.0676866	Kansas
7146	2020	0.0584022	Kansas
51	2020	0.0444936	Kansas
5030	2020	0.0411120	Kansas
5382	2020	0.0439888	Kansas
9123	2020	0.0745613	Kansas
8982	2021	0.0724300	Kansas
5097	2021	0.0410990	Kansas
9695	2021	0.0781784	Kansas
50	2021	0.0437530	Kansas
155	2021	0.1363234	Kansas
44	2022	0.0378789	Kansas
91	2022	0.0795614	Kansas
16085	2022	0.1279288	Kansas
48	2022	0.0418941	Kansas
44	2022	0.0386058	Kansas

The original sampling method for the Census data was a multi-step process using multiple different sampling methods. Three to be exact. What are those three nested sampling methods?

What was the sampling method I used to create the second, smaller sample set in the table above?

Name the type and subtype of each variable in the table.

Compute the following for Household Income:

Mean
Median
Mode
Range
Variance
Standard Deviation

Justify which measure of center (Mean/Median/Mode) is most representative of the data set.

Why would someone prefer Standard Deviation over Variance as a measure of spread for this data?

The below histogram represents the original sample of $n=320$ :

Describe the shape of the histogram
Identify the median and mode
Identify where the mean you calculated falls on this histogram
- Does your calculation feel reasonable?
Does this histogram follow our rules of histograms?

Research and Random Variables

I made an attempt to create some examples of research problems in the major fields represented by our class demographics. While these word problems are far more intensive than anything you’d see on the exam, if you can complete this section you’ll be set for r.v. content.

Reminder only 2 students in this class are in my field, some of these may be “poorly designed” studies

Given word problems $2-5$ , fill in the below tables. Question $1$ serves as an example:

A research lab wants to model the spread of seasonal influenza across college students in America. They decide to take a sample of 500 students from each of 10 major universities that were selected at random from a list of 50 possible universities. They then measure the contact networks (person-to-person interactions) between these students and determine how many contact paths each person is away from their furthest connection. From background research it was found that the average human being is $6$ average contacts away from their furthest connection. The researchers found from their study that college students were $4.8$ contacts away on average.

$\begin{array}{|c|c|c|} \hline \# & \text{Population} & \text{Sample} \\ \hline 1 & \text{American College Students} & \text{500 Students from 10 Universities} \\ 2 & & \\ 3 & & \\ 4 & & \\ 5 & & \\ \hline \end{array}$

$\begin{array}{|c|c|c|c|} \hline \# & \text{Parameter} & \text{Statistic} & \text{Data Type} \\ \hline 1 & \text{6 average contacts} & \text{4.8 average contacts} & \text{Quantitative, Ratio} \\ 2 & & \\ 3 & & \\ 4 & & \\ 5 & & \\ \hline \end{array}$

$\begin{array}{|c|c|c|} \hline \# & \text{Random Variable} & \text{r.v. Type} \\ \hline 1 & \text{Contacts} & \text{Discrete} \\ 2 & & \\ 3 & & \\ 4 & & \\ 5 & & \\ \hline \end{array}$

The Kansas Department of Wildlife, Parks and Tourism is tracking the birth rate of a rare species of bird native to the Great Plains area. They cannot get permission to check every park where the bird is protected as many of them cross state lines, so they make inference by taking counts in every park in Kansas where the bird exists. They know from nationwide reporting that the birth rate of this bird species is $R=2.82$ , but from their tracking they’re seeing a birth rate of $r=3.29$ .

A lab group at K-State launches an experiment to determine the effect of different diets and stretching techniques on lactic acid production in athletes. The team of researchers are launching a completely novel project, but previous papers has suggested that these treatments will have little to no effect. They take a group of $40$ volunteers from various teams in K-State athletics and divide them into $8$ groups (based on 2 Diets and 2 Stretching techniques), finding that S_1 (stretching technique 1) and D_2 (diet 2) on reduced the average proportion of lactic acid in the athletes, during high-intensity cardio, by $0.278$ .

A research farm has been contracted to investigate the effects of three different feed supplements on the post-harvest meat quality of cattle. The researchers focused on a specific high-value cut of meat and measured the weight of this cut in the highest quality grade after each animal was slaughtered. In a typical scenario, a $1000$ pound cow yields about $600$ pounds of carcass weight after slaughter (called the “hanging weight”), with around $300$ pounds of boneless retail cuts. Of these cuts, approximately $10$ pounds typically fall into the highest quality grade (USDA Prime or equivalent), which represents the most marbled and valuable portion of the meat. In the experiment, two of the feed supplements had no significant effect on the amount of the highest quality meat. One supplement significantly increased the yield of the highest quality grade, boosting the average weight of these premium cuts to $33$ pounds per cow.

A psychology lab is working to determine the behavioral differences between arranged and self-selected marital partners. Using volunteers, they run an experiment where the one partner is given a simple, physical puzzle to solve blindfolded, and the other partner is tasked with guiding them. They track their measurement of behavioral differences between the two categories of couples by reviewing recordings of the puzzle being solved and counting the frequency that certain key words are said, then categorizing them into positive or negative statements. They would also check the speed at which the couples completed the puzzle. This puzzle task is well studied and overall the average time to complete the puzzle for any two people is roughly $6.5$ minutes. From their experiment where they observed $43$ couples, the partners coming from arranged marriages had an average time $1.2 minutes$ faster than the self-selected marriages, who finished in $5.3$ minutes on average, but a lower frequency of positive or negative statements, with a negligibly higher proportion of positive statements.