Supplemental Material
Everything in here is meant to help you succeed at the exams in the course. There’s no due date attached to any of it because you aren’t meant to turn any of it in.
If you do want feedback on the work you’ve done feel free to send your work to me as a picture/document, just don’t expect the \(\approx\) 72 hour response time.
The format of this material will mostly be large form word problems with real or simulated data. If this is your last/only math class during your college career this next concept isn’t super helpful to understand, but anyone with more than one math class in their future should pay attention.
Mathematics/Computational courses operate off of a system of exercises and problems
These are gross oversimplifications, many proper mathematicians would argue these lack rigor.
Axioms are logical proofs of concepts, operations, and theorems. They are meant to leverage common sense and intuition. Meaning they have to have a core that is understandable in any context.
Algorithms are a system of steps for solving a problem/yielding a result. Think of them as mathematical/computational recipes. If you follow them exactly you should yield the appropriate result.
Exercises are simplified, in-class examples led by the instructor. Their purpose is to introduce the axioms and algorithms that make up the foundation of the course topics.
Problems are higher complexity, generally difficult, student directed puzzles. The axioms will still apply, the algorithms may not. The student is expected to flex their critical thinking and logic skills to either piece together which axioms and algorithms resolve the puzzle, or construct a new algorithm out of pieces of previously learned ones. They are never unsolvable, but you should have never seen them before.
Household Income
The Census Bureau keeps track of a larger variety of interesting U.S. Household/citizen metrics and an even larger variety of uninteresting metrics.
In order to calculate metrics from Census data, it’s a common practice to split states into counties, counties into districts, residents of the districts into groups based on similarities, and randomly select from those groups in even batches. The result is a sample that is ideally homogeneous.
Below is a table with a sample of \(n=50\) from the original sample size of 320. The sample was taken by evenly dividing the data by year and selecting 5 rows of data at random from each year:
Income (in 1000s of USD) | Year | Share Proportion | State |
---|---|---|---|
53 | 2013 | 0.0476919 | Kansas |
6232 | 2013 | 0.0539027 | Kansas |
5410 | 2013 | 0.0467958 | Kansas |
8380 | 2013 | 0.0724881 | Kansas |
6215 | 2013 | 0.0537543 | Kansas |
89 | 2014 | 0.0796244 | Kansas |
57 | 2014 | 0.0509154 | Kansas |
11540 | 2014 | 0.0993029 | Kansas |
5714 | 2014 | 0.0491657 | Kansas |
57 | 2014 | 0.0513577 | Kansas |
119 | 2015 | 0.1068316 | Kansas |
57 | 2015 | 0.0513888 | Kansas |
8421 | 2015 | 0.0720238 | Kansas |
61 | 2015 | 0.0551635 | Kansas |
91 | 2015 | 0.0817416 | Kansas |
9200 | 2016 | 0.0781513 | Kansas |
60 | 2016 | 0.0536152 | Kansas |
6029 | 2016 | 0.0512156 | Kansas |
45 | 2016 | 0.0398922 | Kansas |
5404 | 2016 | 0.0459074 | Kansas |
4725 | 2017 | 0.0397657 | Kansas |
6931 | 2017 | 0.0583302 | Kansas |
58 | 2017 | 0.0521292 | Kansas |
5619 | 2017 | 0.0472877 | Kansas |
49 | 2017 | 0.0438035 | Kansas |
48 | 2018 | 0.0429701 | Kansas |
6932 | 2018 | 0.0578984 | Kansas |
5558 | 2018 | 0.0464243 | Kansas |
59 | 2018 | 0.0524877 | Kansas |
5507 | 2018 | 0.0459955 | Kansas |
9090 | 2019 | 0.0752758 | Kansas |
63 | 2019 | 0.0553547 | Kansas |
48 | 2019 | 0.0427709 | Kansas |
15375 | 2019 | 0.1273196 | Kansas |
8174 | 2019 | 0.0676866 | Kansas |
7146 | 2020 | 0.0584022 | Kansas |
51 | 2020 | 0.0444936 | Kansas |
5030 | 2020 | 0.0411120 | Kansas |
5382 | 2020 | 0.0439888 | Kansas |
9123 | 2020 | 0.0745613 | Kansas |
8982 | 2021 | 0.0724300 | Kansas |
5097 | 2021 | 0.0410990 | Kansas |
9695 | 2021 | 0.0781784 | Kansas |
50 | 2021 | 0.0437530 | Kansas |
155 | 2021 | 0.1363234 | Kansas |
44 | 2022 | 0.0378789 | Kansas |
91 | 2022 | 0.0795614 | Kansas |
16085 | 2022 | 0.1279288 | Kansas |
48 | 2022 | 0.0418941 | Kansas |
44 | 2022 | 0.0386058 | Kansas |
The original sampling method for the Census data was a multi-step process using multiple different sampling methods. Three to be exact. What are those three nested sampling methods?
What was the sampling method I used to create the second, smaller sample set in the table above?
Name the type and subtype of each variable in the table.
Compute the following for Household Income:
Mean
Median
Mode
Range
Variance
Standard Deviation
Justify which measure of center (Mean/Median/Mode) is most representative of the data set.
Why would someone prefer Standard Deviation over Variance as a measure of spread for this data?
The below histogram represents the original sample of \(n=320\):
Describe the shape of the histogram
Identify the median and mode
Identify where the mean you calculated falls on this histogram
- Does your calculation feel reasonable?
Does this histogram follow our rules of histograms?
Research and Random Variables
- I made an attempt to create some examples of research problems in the major fields represented by our class demographics. While these word problems are far more intensive than anything you’d see on the exam, if you can complete this section you’ll be set for r.v. content.
Reminder only 2 students in this class are in my field, some of these may be “poorly designed” studies
Given word problems \(2-5\), fill in the below tables. Question \(1\) serves as an example:
- A research lab wants to model the spread of seasonal influenza across college students in America. They decide to take a sample of 500 students from each of 10 major universities that were selected at random from a list of 50 possible universities. They then measure the contact networks (person-to-person interactions) between these students and determine how many contact paths each person is away from their furthest connection. From background research it was found that the average human being is \(6\) average contacts away from their furthest connection. The researchers found from their study that college students were \(4.8\) contacts away on average.
\[ \begin{array}{|c|c|c|} \hline \# & \text{Population} & \text{Sample} \\ \hline 1 & \text{American College Students} & \text{500 Students from 10 Universities} \\ 2 & & \\ 3 & & \\ 4 & & \\ 5 & & \\ \hline \end{array} \]
\[ \begin{array}{|c|c|c|c|} \hline \# & \text{Parameter} & \text{Statistic} & \text{Data Type} \\ \hline 1 & \text{6 average contacts} & \text{4.8 average contacts} & \text{Quantitative, Ratio} \\ 2 & & \\ 3 & & \\ 4 & & \\ 5 & & \\ \hline \end{array} \]
\[ \begin{array}{|c|c|c|} \hline \# & \text{Random Variable} & \text{r.v. Type} \\ \hline 1 & \text{Contacts} & \text{Discrete} \\ 2 & & \\ 3 & & \\ 4 & & \\ 5 & & \\ \hline \end{array} \]
- The Kansas Department of Wildlife, Parks and Tourism is tracking the birth rate of a rare species of bird native to the Great Plains area. They cannot get permission to check every park where the bird is protected as many of them cross state lines, so they make inference by taking counts in every park in Kansas where the bird exists. They know from nationwide reporting that the birth rate of this bird species is \(R=2.82\), but from their tracking they’re seeing a birth rate of \(r=3.29\).
- A lab group at K-State launches an experiment to determine the effect of different diets and stretching techniques on lactic acid production in athletes. The team of researchers are launching a completely novel project, but previous papers has suggested that these treatments will have little to no effect. They take a group of \(40\) volunteers from various teams in K-State athletics and divide them into \(8\) groups (based on 2 Diets and 2 Stretching techniques), finding that S_1 (stretching technique 1) and D_2 (diet 2) on reduced the average proportion of lactic acid in the athletes, during high-intensity cardio, by \(0.278\).
- A research farm has been contracted to investigate the effects of three different feed supplements on the post-harvest meat quality of cattle. The researchers focused on a specific high-value cut of meat and measured the weight of this cut in the highest quality grade after each animal was slaughtered. In a typical scenario, a \(1000\) pound cow yields about \(600\) pounds of carcass weight after slaughter (called the “hanging weight”), with around \(300\) pounds of boneless retail cuts. Of these cuts, approximately \(10\) pounds typically fall into the highest quality grade (USDA Prime or equivalent), which represents the most marbled and valuable portion of the meat. In the experiment, two of the feed supplements had no significant effect on the amount of the highest quality meat. One supplement significantly increased the yield of the highest quality grade, boosting the average weight of these premium cuts to \(33\) pounds per cow.
- A psychology lab is working to determine the behavioral differences between arranged and self-selected marital partners. Using volunteers, they run an experiment where the one partner is given a simple, physical puzzle to solve blindfolded, and the other partner is tasked with guiding them. They track their measurement of behavioral differences between the two categories of couples by reviewing recordings of the puzzle being solved and counting the frequency that certain key words are said, then categorizing them into positive or negative statements. They would also check the speed at which the couples completed the puzzle. This puzzle task is well studied and overall the average time to complete the puzzle for any two people is roughly \(6.5\) minutes. From their experiment where they observed \(43\) couples, the partners coming from arranged marriages had an average time \(1.2 minutes\) faster than the self-selected marriages, who finished in \(5.3\) minutes on average, but a lower frequency of positive or negative statements, with a negligibly higher proportion of positive statements.