Supplemental Material

Everything in here is meant to help you succeed at the exams in the course. There’s no due date attached to any of it because you aren’t meant to turn any of it in.

If you do want feedback on the work you’ve done feel free to send your work to me as a picture/document, just don’t expect the \(\approx\) 72 hour response time.

The format of this material will mostly be large form word problems with real or simulated data. If this is your last/only math class during your college career this next concept isn’t super helpful to understand, but anyone with more than one math class in their future should pay attention.

Mathematics/Computational courses operate off of a system of exercises and problems

These are gross oversimplifications, many proper mathematicians would argue these lack rigor.

  • Axioms are logical proofs of concepts, operations, and theorems. They are meant to leverage common sense and intuition. Meaning they have to have a core that is understandable in any context.

  • Algorithms are a system of steps for solving a problem/yielding a result. Think of them as mathematical/computational recipes. If you follow them exactly you should yield the appropriate result.

  • Exercises are simplified, in-class examples led by the instructor. Their purpose is to introduce the axioms and algorithms that make up the foundation of the course topics.

  • Problems are higher complexity, generally difficult, student directed puzzles. The axioms will still apply, the algorithms may not. The student is expected to flex their critical thinking and logic skills to either piece together which axioms and algorithms resolve the puzzle, or construct a new algorithm out of pieces of previously learned ones. They are never unsolvable, but you should have never seen them before.

Household Income

The Census Bureau keeps track of a larger variety of interesting U.S. Household/citizen metrics and an even larger variety of uninteresting metrics.

In order to calculate metrics from Census data, it’s a common practice to split states into counties, counties into districts, residents of the districts into groups based on similarities, and randomly select from those groups in even batches. The result is a sample that is ideally homogeneous.

Below is a table with a sample of \(n=50\) from the original sample size of 320. The sample was taken by evenly dividing the data by year and selecting 5 rows of data at random from each year:

Income (in 1000s of USD) Year Share Proportion State
53 2013 0.0476919 Kansas
6232 2013 0.0539027 Kansas
5410 2013 0.0467958 Kansas
8380 2013 0.0724881 Kansas
6215 2013 0.0537543 Kansas
89 2014 0.0796244 Kansas
57 2014 0.0509154 Kansas
11540 2014 0.0993029 Kansas
5714 2014 0.0491657 Kansas
57 2014 0.0513577 Kansas
119 2015 0.1068316 Kansas
57 2015 0.0513888 Kansas
8421 2015 0.0720238 Kansas
61 2015 0.0551635 Kansas
91 2015 0.0817416 Kansas
9200 2016 0.0781513 Kansas
60 2016 0.0536152 Kansas
6029 2016 0.0512156 Kansas
45 2016 0.0398922 Kansas
5404 2016 0.0459074 Kansas
4725 2017 0.0397657 Kansas
6931 2017 0.0583302 Kansas
58 2017 0.0521292 Kansas
5619 2017 0.0472877 Kansas
49 2017 0.0438035 Kansas
48 2018 0.0429701 Kansas
6932 2018 0.0578984 Kansas
5558 2018 0.0464243 Kansas
59 2018 0.0524877 Kansas
5507 2018 0.0459955 Kansas
9090 2019 0.0752758 Kansas
63 2019 0.0553547 Kansas
48 2019 0.0427709 Kansas
15375 2019 0.1273196 Kansas
8174 2019 0.0676866 Kansas
7146 2020 0.0584022 Kansas
51 2020 0.0444936 Kansas
5030 2020 0.0411120 Kansas
5382 2020 0.0439888 Kansas
9123 2020 0.0745613 Kansas
8982 2021 0.0724300 Kansas
5097 2021 0.0410990 Kansas
9695 2021 0.0781784 Kansas
50 2021 0.0437530 Kansas
155 2021 0.1363234 Kansas
44 2022 0.0378789 Kansas
91 2022 0.0795614 Kansas
16085 2022 0.1279288 Kansas
48 2022 0.0418941 Kansas
44 2022 0.0386058 Kansas

The original sampling method for the Census data was a multi-step process using multiple different sampling methods. Three to be exact. What are those three nested sampling methods?


What was the sampling method I used to create the second, smaller sample set in the table above?


Name the type and subtype of each variable in the table.


Compute the following for Household Income:

  • Mean

  • Median

  • Mode

  • Range

  • Variance

  • Standard Deviation


Justify which measure of center (Mean/Median/Mode) is most representative of the data set.


Why would someone prefer Standard Deviation over Variance as a measure of spread for this data?


The below histogram represents the original sample of \(n=320\):

  • Describe the shape of the histogram

  • Identify the median and mode

  • Identify where the mean you calculated falls on this histogram

    • Does your calculation feel reasonable?
  • Does this histogram follow our rules of histograms?

Research and Random Variables

  • I made an attempt to create some examples of research problems in the major fields represented by our class demographics. While these word problems are far more intensive than anything you’d see on the exam, if you can complete this section you’ll be set for r.v. content.

Reminder only 2 students in this class are in my field, some of these may be “poorly designed” studies

Given word problems \(2-5\), fill in the below tables. Question \(1\) serves as an example:


  1. A research lab wants to model the spread of seasonal influenza across college students in America. They decide to take a sample of 500 students from each of 10 major universities that were selected at random from a list of 50 possible universities. They then measure the contact networks (person-to-person interactions) between these students and determine how many contact paths each person is away from their furthest connection. From background research it was found that the average human being is \(6\) average contacts away from their furthest connection. The researchers found from their study that college students were \(4.8\) contacts away on average.


\[ \begin{array}{|c|c|c|} \hline \# & \text{Population} & \text{Sample} \\ \hline 1 & \text{American College Students} & \text{500 Students from 10 Universities} \\ 2 & & \\ 3 & & \\ 4 & & \\ 5 & & \\ \hline \end{array} \]

\[ \begin{array}{|c|c|c|c|} \hline \# & \text{Parameter} & \text{Statistic} & \text{Data Type} \\ \hline 1 & \text{6 average contacts} & \text{4.8 average contacts} & \text{Quantitative, Ratio} \\ 2 & & \\ 3 & & \\ 4 & & \\ 5 & & \\ \hline \end{array} \]

\[ \begin{array}{|c|c|c|} \hline \# & \text{Random Variable} & \text{r.v. Type} \\ \hline 1 & \text{Contacts} & \text{Discrete} \\ 2 & & \\ 3 & & \\ 4 & & \\ 5 & & \\ \hline \end{array} \]

  1. The Kansas Department of Wildlife, Parks and Tourism is tracking the birth rate of a rare species of bird native to the Great Plains area. They cannot get permission to check every park where the bird is protected as many of them cross state lines, so they make inference by taking counts in every park in Kansas where the bird exists. They know from nationwide reporting that the birth rate of this bird species is \(R=2.82\), but from their tracking they’re seeing a birth rate of \(r=3.29\).


  1. A lab group at K-State launches an experiment to determine the effect of different diets and stretching techniques on lactic acid production in athletes. The team of researchers are launching a completely novel project, but previous papers has suggested that these treatments will have little to no effect. They take a group of \(40\) volunteers from various teams in K-State athletics and divide them into \(8\) groups (based on 2 Diets and 2 Stretching techniques), finding that S_1 (stretching technique 1) and D_2 (diet 2) on reduced the average proportion of lactic acid in the athletes, during high-intensity cardio, by \(0.278\).


  1. A research farm has been contracted to investigate the effects of three different feed supplements on the post-harvest meat quality of cattle. The researchers focused on a specific high-value cut of meat and measured the weight of this cut in the highest quality grade after each animal was slaughtered. In a typical scenario, a \(1000\) pound cow yields about \(600\) pounds of carcass weight after slaughter (called the “hanging weight”), with around \(300\) pounds of boneless retail cuts. Of these cuts, approximately \(10\) pounds typically fall into the highest quality grade (USDA Prime or equivalent), which represents the most marbled and valuable portion of the meat. In the experiment, two of the feed supplements had no significant effect on the amount of the highest quality meat. One supplement significantly increased the yield of the highest quality grade, boosting the average weight of these premium cuts to \(33\) pounds per cow.


  1. A psychology lab is working to determine the behavioral differences between arranged and self-selected marital partners. Using volunteers, they run an experiment where the one partner is given a simple, physical puzzle to solve blindfolded, and the other partner is tasked with guiding them. They track their measurement of behavioral differences between the two categories of couples by reviewing recordings of the puzzle being solved and counting the frequency that certain key words are said, then categorizing them into positive or negative statements. They would also check the speed at which the couples completed the puzzle. This puzzle task is well studied and overall the average time to complete the puzzle for any two people is roughly \(6.5\) minutes. From their experiment where they observed \(43\) couples, the partners coming from arranged marriages had an average time \(1.2 minutes\) faster than the self-selected marriages, who finished in \(5.3\) minutes on average, but a lower frequency of positive or negative statements, with a negligibly higher proportion of positive statements.