Current as of 2023-09-29
Lecture: MW 12-1:30pm (MCNB 309)
Dr. Marc Trussler
Fox-Fels Hall 32 (3814 Walnut Street)
Office Hours: M 9-11am
TA: Dylan Radley
Fox-Fels Hall 35 (3814 Walnut Street)
The first step of many data science sequences is to learn a great deal about how to work with individual data sets: cleaning, tidying, merging, describing and visualizing data. These are crucial skills in data analytics, but describing a data set is not our ultimate goal. The ultimate goal of data science is to make inferences about the world based on the small sample of data that we have.
PSCI 1801 shifts focus to this goal of inference. Using a methodology that emphasizes intuition and simulation over mathematics, this course will cover the key statistical concepts of probability, sampling, distributions, hypothesis testing, and covariance. The goal of the class is for students to ultimately have the knowledge and ability to perform, customize, and explain bivariate and multivariate regression. Students who have not taken PSCI-1800 should have basic familiarity with R, including working with vectors and matrices, basic summary statistics, visualizations, and for() loops.
PSCI 1800 (formerly 107) or similar
R course. To help us better understand the nature of inferential statistics, we will be running quite a lot of simulations in
R. Students entering the class should have a working knowledge of the
R programming language, and in particular know how to use square brackets to index vectors and to run
for() loops. We will be doing a short refresher on these concepts in the first two weeks of class.
We will use Slack to communicate with the class. You will receive an invitation to join the our channel shortly after the start of class. One of the better things to come through the pandemic is the use of Slack for classroom communications. It is a really good tool to allow us to send quick and informal messages to individual students or groups (or for you to message us). Similarly, it allows you to collaborate with other students in the class, and is a great place to get simple questions answered. Because we will be making announcements via Slack, it is extremely important you get this set up.
The lectures will be in person. While this is not a discussion-based class, there is an expectation of some amount of participation and feedback. Attendance will not be recorded, though do note you are scored on participation.
The course will require students to have access to a personal computer in order to run the statistics software. If this is not possible, please consult with one of the instructors as soon as possible. Support to cover course costs is available through (https://srfs.upenn.edu/sfs)[Student Financial Services].
We expect all students to abide by the rules of the University and to follow the Code of Academic Integrity.1
For Problem Sets: Collaboration on problem sets is permitted. Ultimately, however, the write-up and code that you turn in must be your own creation. Please write the names of any students you worked with at the top of each problem set. 2
For Exams: Collaboration on the take home exams is cheating. Anyone caught collaborating (and I have caught many) will be immediately referred to the University’s disciplinary system.
All student work will be assessed using fair criteria that are uniform across the class. If, however, you are unsatisfied with the grade you received on a particular assignment (beyond simple clerical errors), you can request a re-grade using the following protocol. First, you may not send any grade complaints or requests for re-grades until at least 24 hours after the graded assignment was returned to you. After that, you must document your specific grievances in writing by submitting a PDF or Word Document to the teaching staff. In this document you should explain exactly which parts of the assignment you believe were mis-graded, and provide documentation for why your answers were correct.We will then re-score the entire assignment (including portions for which you did not have grievances), and the new score will be the one you receive on the assignment (even if it is lower than your original score).
Notwithstanding everything below: exceptions to all of these policies will be made for health reasons, extraordinary family circumstances, and religious holidays. The teaching staff are extremely reasonable and lenient, as long as you discuss with us potential issues *before} the deadline.
For problem sets: You are granted 5 ``grace days’’ throughout the semester. Over the course of the semester you can use these when you need to turn problem sets in late. You can only use 3 grace days on any given assignment. You do not have to ask to use these days. This is counted in whole days, so if a problem set is turned in at 5:01pm the day it is due (i.e. 1 minute late) you will have used 1 grace day. If you turn the problem set in at 5:01pm the day after it is due (i.e. 24 hours and 1 minute late) you will have used 2 grace days etc. Choosing to not complete a problem set (see policy below) does not affect your grace days. Once you are out of grace days subsequently late problem sets will be graded as incomplete.
The nature of the two exams (timed exams completed during a certain window) does not allow for any extensions.
All assignments will be graded anonymously. Please hand assignments in on Canvas with your student number, not your name.
This portion of your grade mixes two components:
Traditional participation including: asking and answering questions in lecture and in recitations, asking and answering questions on the course Slack, or attending office hours.
The completion of weekly ``check-in’’ quizzes on Canvas. These will be available each week, will take less than 5 minutes, and will be graded by completion (not correctness).
Problem sets (45%)
Five problem sets (roughly every two weeks)
Scored out of 100.
You are free to do as many of the problem sets as you like. If you do not complete a problem set, the percentage points for that assignment will be transferred to the first exam (for PS1 and PS2), or the second exam (for PS3, PS4, & PS5). For example if you don’t complete PS2, the first exam would then be worth 34% of your final grade. If you don’t complete PS4 & PS5, the second exam would be worth 43% of your final grade.
First Exam 25%
- This will be an open-book 24 hour take-home test. The test will open on Monday, October 16 at 3:00pm and close on Friday, October 20 at 11:59pm. You can select any 24 hour period to do the test during this window. The latest you can open the test and still have 24 hours to complete it is therefore October 19th at 11:59pm. You may not work with other students on this exam. It will take a similar form as the problem sets.
Second Exam 25%
- This will be a 2 hour open-book take-home test completed during the final exam period. You may not work with other students on this exam. Because of the shortened time frame this exam will be less coding intensize and focus more on theoretic concepts.
We will use R in this class, which you can download for free at https://www.r-project.org/. R is completely open source and has an almost endless set of resources online. Virtually any data science job you could apply nowadays to will require some background in R programming.
While R is the language we will use, RStudio is a free program that makes it considerably easier to work with R. After installing R, you should install RStudio https://www.rstudio.com. Please have both R and RStudio installed by the end of the first week of classes.
If you’re having trouble installing either program, there are more detailed installation instructions on the course Canvas page.
There is one mandatory textbook for this course and two optional:
Data Analysis for Social Science: A Friendly and Practical Introduction. Elena Llaudet & Kosuke Imai. (Mandatory).
- I have chosen this book because it does a really good job of weaving in the basics of statistics with the use of R. Generally speaking the assigned readings from this book will be slightly less technical than what is in the class notes. This book is available at the bookstore and from Amazon. There is only one addition, but be sure to get the (way cheaper) paperback version.
Quantitative Social Science: an Introduction. Kosuke Imai.
- This is the original, graduate level, textbook the Llaudet and Imai textbook is based on. The chapters are largely the same, but this textbook is much more math intensive. I have included below the equivalent readings (labeled QSS) if you want to go into greater detail. These readings are completely optional.
Statistics: Fourth Edition. Freedman, Pisani, Purves. (Optional).
- This textbook has a slightly more conversational and intuitive approach, but does not incorporate those lessons with R. While having this book is not mandatory I really like the style and common-sense explanations of this book. It’s a great companion to have around.
R Review/Start probability
Llaudet & Imai 6.1,6.2,6.7
(QSS 4.11, 6.1)
September 12: course selection period ends
Random Variables I: Discrete
Llaudet & Imai 6.4.1
Problem Set 1 Due Wednesday 7pm.
Sampling and confidence intervals
Llaudet & Imai 6.5.1,6.5.2
October 9: Drop period ends
Problem Set 2 Due Wednesday 7pm.
Wednesday class will be a drop-in review session in our usual classroom.
First Midterm Exam period Monday 3:00pm to Friday 11:59pm.
Standard error of the mean/Field Trip
Llaudet & Imai 6.5.3
On October 25th we will take a class field trip to the NBC News Decision Desk.
October 27: Grade type change deadline.
Standard error of the mean/Field Trip
Llaudet & Imai 7.1 7.3 7.4
Problem Set 3 Due Wednesday 7pm.
Two continuous variables and covariation
Llaudet & Imai 3.5
November 6: Withdrawal deadline
Correlation and bivariate regressionn
Llaudet & Imai 4.3
Problem Set 4 Due Wednesday 7pm.
Interaction with regression
Excerpt from Kam and Franzese (Canvas)
Problem Set 5 Due Wednesday 7pm.