# MGHIHP HE-902, Spring 2020

*Version – 13 April 2020*

# Chapter 1 Course Information and Reference

This online e-book is the main resource to guide you through the course HE-902 in the PhD in HPEd program at MGHIHP in the Spring 2020 semester.

Each chapter contains reading (or links to reading) that you should do as well as an assignment that you should complete and submit by the deadline in the course calendar.

My name is Anshul Kumar and I am the author/preparer of this e-book. You can reach me at akumar@mghihp.edu with any questions, comments, and/or suggestions for modifications to this e-book.

All of the materials here are available online for anybody to use. Those who are not part of the course HE-902 are welcome to use this e-book however it might be useful. Please e-mail me any feedback you have.

I use a lot of footnotes like the one after this sentence.^{1} You can read the footnote by clicking on the small-but-tall number in between this sentence and the previous one. Footnotes contain comments from me or extra information that might be helpful. But footnotes in this e-book are never necessary to read. **It is fine for you to skip the footnotes and not read them.**

## 1.1 Course Calendar

The calendar below shows when assignments are due and when online video calls (using Zoom) must be scheduled. Keep in mind that this calendar might change during the semester.

Each week starts on a Monday and ends on a Sunday. For example, the Sunday on the week of January 13 refers to January 19 (the first assignment is due on January 19). You can submit an assignment at any time on the day when it is due.

**Each week, read and follow all instructions in the corresponding chapter. Then complete the assignment at the end of the chapter.**

Week of | Chapter | Assignment Due | Additional Required |
---|---|---|---|

Jan 13 | 2 | Sunday | |

Jan 20 | 3 | Sunday | Zoom call, RStudio orientation |

Jan 27 | 4 | Sunday | |

Feb 3 | 5 | Sunday | |

Feb 10 | None | None | Oral exam #1 (Zoom) |

Feb 17 | 6 | Sunday | |

Feb 24 | 7 | Sunday | |

Mar 2 | 8 | Sunday | Zoom call, check-in |

Mar 9 | None | (spring break) | |

Mar 16 | 9 | Sunday | Oral exam #2 (Zoom) |

Mar 23 | 10 | Sunday | |

Mar 30 | 11 | Sunday | |

Apr 6 | 12 | Sunday | |

Apr 13 | 13 | Friday | Oral exam #3 (Zoom) |

Apr 20 | None | Final Project, Apr 25 |

TBD = to be determined

Note that the week of April 13 will be the final week in which there is new material given.

The final project is due on April 25.

## 1.2 Useful Links

- Link to this HE-902 e-book: https://bookdown.org/anshul302/HE902-MGHIHP-Spring2020/ or http://tinyurl.com/he902stats.
- HE-802 at MGHIHP
- HE-942 onsite seminar Quantitative Methods Workshop: http://tinyurl.com/he942quant
- Data Analysis Examples from UCLA
- MGHIHP Academic Calendar (linked from here)

## 1.3 Assignments, Grading, and Curriculum

### 1.3.1 Grade Calculation

Type of Work | % of Grade |
---|---|

Weekly assignments | 25 |

Participation | 5 |

Oral exams | 45 |

Final project | 25 |

Your grade will be calculated as shown in the table above. This supersedes whatever the syllabus says.

### 1.3.2 Description of Curriculum

Here are descriptions of the activities you will do this term:

- Weekly assignments: Homework assignments will involve applying/practicing the statistical technique(s) that is the focus of the week using a provided dataset. Because of the cumulative nature of the course, these assignments may also involve applying knowledge from previous weeks’ material.
- Participation: This mostly relates to your participation in mandatory video calls. The tentative plan is to have two calls on Zoom during the semester. The second of these calls can potentially be combined with one of the oral exams. We may also try to organize one or more Zoom meetings with all or multiple of us together, in which case your participation in those will also count towards this participation grade.
- Oral exams: You must take three oral exams, each occurring at some point during the weeks specified in the calendar. Each exam will be a one-on-one Zoom (online video) meeting between a student and me. During the exam, I will ask you questions to test your understanding of the concepts we cover and I will ask you to demonstrate data analysis tasks on your own computer while sharing your screen on Zoom. The exams are “open-book,” meaning that you can refer to any notes or course materials during the exam. You are allowed to re-take an exam as many times as you would like, if you are not satisfied with your initial performance. If you want to get your exam dates on the calendar early, just e-mail me and we can definitely do that.
- Final project: You will conduct a full statistical analysis that answers a research question with a provided dataset. You will present
^{2}what you did to the rest of the class.

## 1.4 Final Project Details and Requirements

### 1.4.1 Description

With just over a month remaining in our time together in this course, I would like to share specific expectations and requirements for the final project in this class. The final project is *not* meant to be even close to a full quantitative research study. Instead, you can think of the final project as a take-home final exam. Another way to think about the project is that you will be writing an extended methods section and a condensed results section of an empirical research article.

Note^{3} that you can ask for help from me as you do the project. It’s not like our oral exams in which you have to do each skill alone.

**The final project is due on April 25, 2020.**^{4}

### 1.4.2 Project Goals

**The goals of this final project are to…**

Present and interpret the results of

*one*regression model that answers a clear and specific research question.Run, interpret, and appropriately respond to

*all*required diagnostic tests for this regression model and present the results of all tests.

### 1.4.3 Project Requirements

**Here are the items you must present and tasks you must complete:**

Write a clear research question (RQ) that can be solved using regression analysis techniques. This research question should be a

*single sentence*with a question mark at the end.^{5}To answer your RQ, various concepts will have to be first measured, recorded in a dataset as variables, and then related to each other quantitatively. Present a DAG (directed acyclic graph) that shows the relationship between all of the variables that will be involved in answering your RQ. The DAG you present does not need to be aesthetically pleasing.

Identify a dataset that you will use to answer your research question. Clearly describe the dataset, including: a) population from which the data sample was drawn, b) unit of observation, c) all variables that you will use in your analysis and the unit of measurement of each variable, d) background information about the data.

Given the structure of the data and the RQ of interest, explain which type of regression model is most appropriate to answer your RQ and why. Also identify at least one other type of regression model that could also be used and explain why you instead chose the type of model that you did.

Present basic descriptive statistics that are relevant to your RQ. You should include at least one table and at least one figure/chart. All variables from your DAG must be included in these descriptive statistics.

Show the code and results of

*one*regression model that answers your RQ.^{6}Run and present the results of

*all*diagnostic tests that pertain to the type of regression model you ran. Your regression model*must pass*all of the tests. If your diagnostics show that your model specification violates any of the assumptions of your regression model, you should fix the problem and try again until your model no longer violates the assumptions.Interpret the results of your regression model that are relevant to your RQ.

Briefly explain any limitations in your analysis.

Include all R code and results in your final submission.

Present all writing in well-written English.

Present everything in an aesthetically pleasing manner (with the exception of your DAG). It is recommended that you use an RMarkdown document, but this is not required.

### 1.4.4 Grading Rubric

The final project will be graded according to the rubric below. Each criterion is worth a maximum score of two points unless otherwise noted.

Criterion | Score = 0 | Score = 1 | Score = 2 |
---|---|---|---|

Clear RQ | Unclear, more than one sentence, not a question. | Confusingly presented but understandable. | Clear, simply written, single sentence ending with a question mark. |

DAG | Not all relevant variables are shown, items that are not variables in the data are shown, arrows do not make sense, unblocked backdoor paths are present, graph is cyclical. |
Minor errors. | DAG clearly shows hypothesized relationship among all variables, including any confounding, mediating, separately-acting, or unmeasureable variables. |

Population and sample | Relationship between sample and population is unclear, details about population is omitted. | Minor omissions, but overall description of the population is understandable. | It is very clear what the population is and how many observations from this population were sampled and then included in the dataset used in the project. |

Unit of observation | The meaning of each row in the data is not understandable from what is written. | Reader can figure out based on context, but a clear explanation is missing. | It is very clear what each row of the data means/represents. This is explicitly stated with no ambiguity or confusion. |

Variables used | The variables used in the analysis are not addressed. | Some variables are mentioned but not all. How each variable is measured is not clear. | Dependent variable and all independent variables are described in one sentence each. Unit of measure (and any relevant explanation of how a variable is coded in the data) is given for each variable. |

Background on data | It is not understandable where the data came from and from what context. | Few details are given about the data. | Clear explanation of where the data came from, when it was collected, who collected it, etc. |

Choose model | No explanation of why the presented regression model was chosen. No comparison to another regression model. Incorrect selection of model type. | An explanation may be there but it might be incorrect, or a comparison to another model is missing. | Logical explanation of the way the data is structured and how the selected regression model is best suited to that data structure. Clear explanation of why at least one alternative regression model was not used. |

Descriptive statistics | No or very few statistics presented. Statistics for irrelevant variables or information are presented. | Descriptive statistics do not cover all variables and observations relevant to the RQ. Only one of two required charts is included. | Descriptive statistics are presented for all variables listed in the DAG and used in the regression model. One well-made figure is presented. One well-made table is presented. |

Regression model | Code and/or summary is not shown for regression. Code does not accomplish the type of regression that was supposed to be used. | Only partial work or result is shown. Type of regression model is unclear. | Correct regression model result is shown along with appropriate R code to execute it. |

Multicollinearity test | Test not presented. | Results do not satisfy assumption. Results presented but interpreted incorrectly. | Regression model passes the VIF test of multicollinearity. A correlation matrix should be presented if necessary, but this is not required. |

Independence assumption | Assumption not considered. | Assumption is mentioned but incorrectly interpreted. | The issue of independence of sampled observations is carefully discussed. Non-independent data should be accounted for appropriately in the regression model. |

Other assumption 1 | Assumption not considered. | Assumption is mentioned but incorrectly interpreted. | Assumption is tested correctly and interpreted correctly. |

Other assumption 2 | Assumption not considered. | Assumption is mentioned but incorrectly interpreted. | Assumption is tested correctly and interpreted correctly. |

Other assumption 3 | Assumption not considered. | Assumption is mentioned but incorrectly interpreted. | Assumption is tested correctly and interpreted correctly. |

Interpret results | Many irrelevant details are given. Research question is not clearly answered. | Research question is answered but interpretation of results is not exactly correct. | Succinct interpretation of the portion of the regression output that pertains to the RQ. |

Limitations | Limitations are not addressed or are completely incorrect given the regression model used. | Limitations are partially addressed. | Multiple plausible limitations to the analysis and the conclusions we can draw from it are addressed. |

R code included | No R code is included | Only partial R code for the results presented is given. | R code is included (displayed in final document) for all results that were generated using R. |

Writing quality (+) | Sentences and paragraphs are not formatted according to convention. Full sentences are not used much or well. | Minor grammar and/or spelling errors occur throughout, but the main points are understandable. | Writing is clear and succinct. It is easy to read quickly and understand the analysis and the results. No grammar or spelling errors. |

Aesthetics | Project is presented in a confusing manner. Order and flow of requested items is not logical. Unnecessary fonts, symbols, and formatting layout appears. | Minor blemishes and errors are visible in the submitted project. | Order of all content is clear and logical. Sections and sub-sections are logically and clearly marked. The write-up is easy to read. |

Items marked with a (+) in the table above will carry more weight than just two points. All other items have a maximum score of two points.

Your grade on the project will be the number of points achieved divided by the total number of points possible.

**If you are not satisfied with your grade on this project, you do have the option of taking an INCOMPLETE grade for the course. Then, you will improve and re-submit your project in the weeks that follow the end of the course. I will re-grade the project and then put your improved final grade for the course into the grading system.**

## 1.5 Required Materials

- There is
*no required textbook*that you need to purchase for this course. All reading-related resources will be freely available online. - You should have a computer that can do video meetings, word processing, web browsing, and spreadsheet viewing/editing.
^{7} - You must install Zoom on your computer. Note that you do not need to create an account.
- You will need to have access to the free program RStudio.
^{8}My plan is for everyone to just access this through the website http://rstudio.cloud, so that you don’t have to download anything to your computer.^{9}But you could also download R and RStudio to your own computer if you prefer. One of the first few homework assignments will walk you though setting this up, so you don’t need to do this until then. - In my experience so far, R and RStudio do not work well on tablets like iPads and Android tablets. It is best to have a Mac, Windows, Linux, or other similar full operating system on a desktop or laptop computer. Chrome OS might be okay but I have not tested it before.

## 1.6 Acknowledgments

- The building blocks for this e-book are taken from
*A Minimal Book Example*by Yihui Xie, available at https://github.com/rstudio/bookdown-demo. This work would not be possible without this excellent guide from Yihui. - Much of the content for this book is influenced by the teaching and research conducted by my colleagues and students.

This is a footnote. You can go back to where you were reading by clicking on the little squiggle right here: ↩

I’m not sure yet if we’ll all meet on Zoom to present to each other or if we’ll do something else like make audio recordings of the presentations↩

This note was added on March 19, 2020↩

This due date was added on March 25, 2020.↩

There are no exceptions to this requirement.↩

In reality, you will likely run many regression models on your own to arrive at the one that fits your data the best. But you do not need to show all of this work in your final submission. If you do wish to show all of this additional work, you can include it in an appendix to your assignment, but this is not required.↩

You can use Google Sheets online for free if you don’t have a spreadsheet program like Excel on your own computer. LibreOffice is another good option that is free.↩

If you prefer to use a different data analysis platform such as Stata, SPSS, SAS, etc, this may be possible but we should discuss it first. I have chosen to use R and RStudio in this course because it is free and widely used. ↩

And so that we don’t have to go through the inevitable process of getting the program to work on each person’s individual computer. So far RStudio Cloud has worked without errors for my approximately 30 students and me over the last year, in various courses or instructional situations.↩