9 Lab 1 - Video Games Survey (EDA)
9.0.1 Our data
The codebook for this dataset is available at the link below under the heading “Video Game Survey”.
https://www.stat.berkeley.edu/users/statlabs/labs.html
Background information about how this data was collected is included below. It comes from the textbook Stat Labs: Mathematical Statistics Through Applications by Deborah Nolan and Terry Speed.
All of the population studied were undergraduates enrolled in Introductory Probability and Statistics, Section 1, during Fall 1994. The course is a lower-division prerequisite for students intending to major in business. During the Fall semester the class met Monday, Wednesday, and Friday from 1 to 2 pm in a large lecture hall that seats four hundred. In addition to three hours of lecture, students attended a small, one-hour discussion section that met on Tuesday and Thursday. There were ten discussion sections for the class, each with approximately 30 students.
The list of all students who had taken the second exam of the semester was used to select the students to be surveyed. The exam was given the week prior to the survey. A total of 314 students took the exam. To choose 95 students for the study, each student was assigned a number from 1 to 314. A pseudo-random number generator selected 95 numbers between 1 and 314. The corresponding students were entered into the study.
To encourage honest responses, the students’ anonymity was preserved. No names were placed on the surveys, and completed questionnaires were turned in for data entry without any personal identification on them.
To limit the number of nonrespondents, a three-stage system of data collection was employed. Data collectors visited both the Tuesday and Thursday meetings of the discussion sections in the week the survey was conducted. The students had taken an exam the week before the survey, and the graded exam papers were returned to them during the discussion section in the week of the survey. On Friday, those students who had not been reached during the discussion section were located during the lecture. A total of 91 students completed the survey.
Finally, to encourage accuracy in reporting, the data collectors were asked to briefly inform the students of the purpose of the survey and of the guarantee of anonymity.
9.0.2 Your tasks
1. Pick your own research question.
In the first part of the tutorial, I answered the question “How do responses to the question ‘Do you like to play video games?’ vary between males and females in this survey?”. Review the codebook for this dataset and think of your own research question.
In addition, briefly explain in about one paragraph why your research question is worth investigating.
Of course you may not use my research question.
2. Answer your research question.
You must create at least one visualization with the ggplot2
package and you must use at least one summary technique of some kind. It is up to you to choose what kinds of plots and summary techniques to use. Be creative! But also be appropriate. For example, heatmaps are cool, but they’re not useful for a project like this. Also, to clarify, this is a descriptive statistics project. Please use descriptive methods only. Please do not use inferential/Bayesian/machine learning/deep learning methods of any kind.
In addition, briefly explain in about a paragraph the significance of your findings.
Please remember to follow the requiremennts that are listed in the “Lab Requirements” section of this book.