Chapter 4 Week 10 – Nov 11 2020 class meeting

This week, our goals are to…

  1. Code qualitative data systematically to identify themes.

  2. Use themes identified from qualitative data to answer a research question.

  3. Run and interpret the results of linear regression models with multiple independent variables.

  4. Identify research questions that can be answered using logistic regression models.

4.1 Before class

4.1.1 Checklist – Complete by Nov 11

By our class meeting on Wednesday, November 11, 2020, you should complete the following tasks:

This is all you need to do before we meet for class. If anything is unclear or you have any questions do not hesitate to email Anshul at or contact me by phone.

It is fine to work with others on the assignments (and sometimes it may even be required), but make sure you state who you worked with at the top of your assignment.

4.1.2 Qualitative Assignment #3

Please write your responses on the computer and be prepared to e-mail them by the start of class on Wednesday, November 11, 2020 at 10 a.m. Boston time.

Also be sure to complete the Week 8 in-class Qualitative Activity, if you have not already, before doing the assignment below.

Task 1: Watch the videos below, which are either examples of or guides related to the presentation of qualitative data:

  1. Example of how you can structure your final presentations (which will be on Wednesday, November 18 2020 for 15 minutes): Brandon Holland. Qualitative Research Final Presentation. Click here.
  2. Another example of how to present your research design and findings: Preliminary Findings from PROMISE Qualitative Study. Click here.

Also have a look at the diagrams below—created by Subha Ramani—which relate to the qualitative research process:

Task 2: Look again at the slide in video #1 above at the 8:22 mark. See how the codes are organized into domains and sub-domains? Do the same for the codes that your group generated during the in-class activity in our most recent meeting. Come to class with a similar table of the codes (related to your team’s research question), ready to share.

Task 3: Look at the classification of codes that you made in the previous task. What are the main themes that emerge from these codes (choose at least two themes)?

Task 4: For each of the themes you identified, pull out one quote from your interview transcript(s) that could be used as an example for that theme. Make sure that each quote is representative of what everyone said in their interview responses related to that theme, rather than a single “cherry-picked” example. Copy each quote into your assignment document.

You have reached the end of this week’s qualitative homework assignment.

4.1.3 Quantitative Assignment #3

Please do this assignment on paper or on the computer—whichever you prefer—and come to class on Wednesday, November 11 2020 at 10 a.m., ready to turn it in. The portions in Excel will of course need to be done on the computer.

You might need to refer to materials from other parts of this e-book in order to complete this assignment. Remember that Control + F (for Windows/Linux users) or Command + F (for Mac users) is your friend.

4.1.3.1 Review

Please review the recent Quantitative Activity during which we learned about linear regression residuals. That section contains everything we went through together in class and you will need to modify the procedure that you used there in order to complete the rest of this assignment.

4.1.3.2 More Linear Regression Residuals

In Quantitative Assignment #2 and in class recently, we looked at the fitness dataset. Now we’re going to look at an updated version of this dataset. Note that the data below are different than previous versions of the fitness dataset.

Name WeeklyWeightliftHours WeightLiftedKG Female
Person A 3 20 0
Person B 4 30 1
Person C 4 21 0
Person D 2 25 1
Person E 6 40 1
Person F 5 30 0

You should be able to copy and paste the data above into Excel on your computer.

We can also visualize this data in a plot, like before, except now we have different colors for each gender:

Note that you could, if you wanted to, reproduce the chart above in Excel.

Task 5: Run a linear regression in Excel in which WeightLiftedKG is the dependent variable and both WeeklyWeightliftHours and Female are independent variables. Keep in mind that this tutorial, which you looked at before, explains how to run a model with multiple independent variables.

Task 6: Write down the equation for this regression.19 The Coefficients section of the same tutorial shows how to do this.

Task 7: What is the R-squared for this regression?

Task 8: Calculate the predicted values of the regression. You can do this using Excel if you know how or you can do it by hand.

Task 9: Calculate the residuals of the regression using Excel.

Task 10: According to our model, what is the predicted weight lifted by a female who weightlifts for seven hours per week?

Task 11: According to our model, what is the predicted weight lifted by a male who weightlifts for seven hours per week?

Task 12: Compare the last two answers (the female versus the male who weightlift for seven hours each). What is the difference? Is that difference equal to a number you see in your outputted regression table in Excel (hint: yes)? Which number?

Task 13: What blanket prediction is this regression model making about the weightlifting capabilities of women versus men, when holding constant the number of hours weightlifted per week?

Task 14: Figure out the correlation of the predicted values and the actual values of the dependent variable. There are many ways to do this and any way is fine!

Task 15: What is the square of the correlation that you just calculated? Compare it to the R-squared output from the regression. What did you find?

Task 16: What is the relationship between R-squared, predicted values, and actual values?

Task 17: Imagine that this fitness dataset with six observations is a sample of six people in Boston. Based on the results of the regression you just ran, what can you say (if anything) about the relationship between weekly hours spent weightlifting, gender, and weightlifting capability among all of the people of Boston (which is the population from which the six-person sample was drawn). Be sure to look at the confidence intervals in your regression output, like we did together recently in class!

4.1.3.3 Exploring New Data

Now it’s time to look at some new data that is actually related to genetic counseling. The data is in an Excel file called “GC-Data1.xlsx” that you can find in D2L in Course Materials -> Content -> Week 7, 8, 10, 11: Quantitative and Qualitative Methodology -> GC-Data1. This link might take you there, but I’m not certain. Please download this data file to your own computer and open it in Excel.

Task 18: Look at the second sheet in the Excel file, called “Field definitions”. This is the codebook or data dictionary for this survey data, meaning that it describes all of the variables so that the researcher (you) can use the data easily.

Task 19: Each observation in this data is a person who has been surveyed/tested. Identify a research question (or two) that would be reasonable to investigate with this data using some kind of regression analysis.

Task 20: Which of the variables in the dataset will be the dependent variable (the outcome of interest) when you investigate your research question? What type of variable is this? Numeric? Categorical? Something else?

Task 21: Identify one or more independent variables in the data that will be included in your analysis. Remember that some of these could be key independent variables that you are most interested in and others could just be control variables that you suspect may be associated with the outcome of interest.

The rest of this part of the assignment is all about understanding this new dataset better and gathering as many descriptive statistics and visualizations as possible that will be helpful as we begin our quantitative analysis.

Task 22: How many observations are in the dataset?

Task 23: How many variables are in the dataset?

Task 24: Produce descriptive statistics for each of the variables in this dataset that you will use in your analysis. This should be about 3–5 variables. This guide may be helpful.

Task 25: For each of the variables in this dataset that you will use in your analysis, produce a meaningful visualization (one per variable) that helps us better understand the distribution of this variable across the observations in the dataset.

Keep in mind the following resources that we used before:

Task 26: Write out the regression equation that you will end up with once you run a regression to answer your research question. Leave the coefficients blank in the equation because you don’t know those yet (the computer will tell you when you run the regression; that’s the whole reason we will do that!).

We will run the actual regression later, not now. The purpose of this assignment was just to open the data in Excel, think about our quantitative research design, and calculate descriptive statistics.

4.1.3.4 Brief Introduction to Logistic Regression

In the OLS linear regression that we have been working with so far, the dependent variable must be a continuous, numeric variable. Examples of acceptable dependent variables for linear regression include height, gas mileage, weight lifted, etc. Basically any variable that is measured as a number and where fractions of whole numbers are still meaningful (for example, someone who can lift 20.5 kilograms can lift more weight than someone who can lift 20 kilograms, and that half-kilogram difference is still meaningful to know about).

In the most common and basic type of logistic regression, the dependent variable is always a variable with just two categories such as 1 or 0, yes or no, male or female, positive for COVID-19 or negative for COVID-19. This is what we call a binary dependent variable or a binary outcome of interest.

Please watch the following video which explains the basics of logistic regression:

Task 27: Write down one or more questions you have about logistic regression.

Task 28: Write an example of a research question that could be studied using logistic regression. Explain the population of interest, what the dependent variable would be and what the key independent variable(s) would be. Be ready to share this in class. Your example can be related to genetic counseling or not. It could also come from the dataset you explored earlier in the assignment, if you want.

Task 29: Explain why you could not use OLS linear regression to answer the example research question that you stated in the previous task.

Additional optional resources about logistic regression are below. It is not necessary to look at these:

  • StatQuest: Logistic Regression Details Pt1: Coefficients. Click here
  • Explaining Logistic Regression Results to Non-Statistical Audiences. Click here

We are unlikely to have time to run logistic regression models during this course. However, it is important that you have an approximate sense for how to interpret results from logistic regression models and also that you know that it is an analytic option for you in the future if you have a quantitative research design with a binary outcome of interest.

You have reached the end of this week’s quantitative homework assignment.

4.2 In class

4.2.1 Schedule

November 11 2020

4.2.2 Quantitative Activity

All students should do all parts of this activity on their own computers, but taking help from each other in their groups.

4.2.2.1 Part 1

Please complete this activity by 10:30 a.m. If you finish earlier, please let Anshul know right away (do not wait until 10:30 a.m.).

Task 30: Write a research question that could be answered using a) the GC-Data1 dataset that you looked at for your assignment due today and b) binary logistic regression (the version of logistic regression you learned about in your assignment, which requires a dependent variable that can take on two values). You will tell Anshul your research question and any required modifications to variables (see below) at 10:30 a.m.

Note that it is fine for the question you write to require the recoding of a variable that currently already exists in the dataset. Here is an example of what I mean (this example is unrelated to the GC-Data1 data): Imagine you have survey data in which your dependent variable (DV) tells you whether people like to watch TV. Their answers are either yes, no, or maybe. Currently, your DV has three possible values. But we can recode the DV such that it has only two values and then we can use binary logistic regression.

Here is a visual example of this recoding process:

Task 32: In the assignment due today, you were asked to write down any questions you have about logistic regression. Discuss these questions with each other. If you are not able to answer any of these questions among yourselves, ask any unresolved questions to Anshul at the next break (10:30 a.m.).

Task 33: Create at least two basic pivot tables that are useful to help understand the information in GC-Data1. Pivot tables are another way to describe your data that we have not yet discussed.

The following tutorial will help you do this:

Task 34: Which types of variables are useful to examine using pivot tables and which types of variables are not (or much less) useful to examine using pivot tables?

Task 35: Create at least two two-dimensional pivot tables (also called two-way tables, cross tables, contingency tables) that are useful to help understand the information in GC-Data1.

As soon as you complete the tasks above, inform Anshul (come back to the main Zoom room) and then switch to today’s qualitative activity.

4.2.2.2 Part 2

Logistic regression results will appear here during class.

4.2.3 Qualitative Activity

Please work on this activity in your qualitative research groups. Any work that you do not complete in class should be finished as homework and submitted to Anshul by email by noon on Monday, November 16 2020.

During next week’s class, each team will present on their practice research projects. Here is what you need to know about your presentation:

  • 10 minutes to present, followed by 5 minutes for questions and answers.
  • Everyone in the group must speak during the presentation.
  • Refer to examples of other qualitative work to decide how/what to present.
  • Share your slides with Anshul by Monday, November 16 2020 at noon. Send an email to with either a) the slides in an attachment or b) a link to a shared document in which viewing and commenting are enabled.

Feel free to ask lots of questions as you prepare!

Task 37: Take five minutes to skim the abstract as well as all tables and figures of the article below. This is another example of how qualitative data can be analyzed and presented:

  • Aliu, O., Corlew, S. D., Heisler, M. E., Pannucci, C. J., & Chung, K. C. (2014). Building Surgical Capacity in Low Resource Countries: A Qualitative Analysis of Task Shifting from Surgeon Volunteers’ Perspectives. Annals of plastic surgery, 72(1), 108. DOI link. Full text link.

Keep in mind that your most recent at-home qualitative assignment included examples of presentations of qualitative projects.

Task 38: Finish any coding and categorization of your data that you have not yet completed. You might want to share your most recent qualitative assignment answers with each other at this time.

Task 39: Identify your main findings and conclusions and how you plan to present them. Make sure you include representative20 quotes from your respondents.

Task 40: Specify the capabilities and limitations of your study. Plan to very briefly (30 seconds) include this in your final presentation.

Task 41: Prepare the slides and script for your final presentation, which you will give in class on Wednesday, November 18 2020. Share the slides with Anshul by noon on Monday, November 16 2020 so that you can receive some feedback. The script does not necessarily need to be written down, but I recommend that you at least have some bullet points prepared for each slide and that you practice the entire presentation at least once.


  1. I learned from Allison that it has been a bit confusing to keep track of everything that you need to turn in that I have assigned to you. Hopefully this list will help with that. Please let me know if you have any other feedback about this issue or in general and I can try to adapt!

  2. It will be longer than the one we did in class, since this has two (rather than just one) independent variables.

  3. This means quotes that express viewpoints you heard repeatedly during your interviews. If you wish to present responses that were not expressed repeatedly, that is also okay, but make sure that you point out to your audience that this is the case and that you plan to investigate further when you do more than just a few interviews.