Chapter 4 Week 9 – Nov 4 2021 class meeting

This week, our goals are to…

Code qualitative data systematically to identify themes.
Use themes identified from qualitative data to answer a research question.
Run and interpret the results of linear regression models with multiple independent variables.
Identify research questions that can be answered using logistic regression models.
Use pivot tables to select and describe data.
Interpret results of published quantitative research.

4.1 Before class

4.1.1 Checklist – Complete by Nov 4

By our class meeting on Thursday November 4 2021, you should complete the following tasks:

Complete the Week 8 in-class Qualitative Activity, if you have not already. This may require meeting with your teams to work together.
Qualitative Assignment #3
Quantitative Assignment #3

This is all you need to do before we meet for class. If anything is unclear or you have any questions do not hesitate to email me (Anshul) at akumar@mghihp.edu or contact me by phone.

It is fine to work with others on the assignments (and sometimes it may even be required), but make sure you state who you worked with at the top of your assignment.

4.1.2 Qualitative Assignment #3

Please write your responses on the computer and submit your work to the appropriate D2L dropbox by the start of our next class together. Each student must submit their own assignment document. You can choose to work individually or as a team for some or all of the tasks in this assignment; both ways are fine. In both cases, each student must submit their own document.

Task 1: Watch the videos below, which are either examples of or guides related to the presentation of qualitative data:

Example of how you can structure your final presentations: Brandon Holland. Qualitative Research Final Presentation. Click here.
Another example of how to present your research design and findings: Preliminary Findings from PROMISE Qualitative Study. Click here.

Also have a look at the diagrams below—created by Subha Ramani—which relate to the qualitative research process:

Task 2: Look again at the slide in video #1 above at the 8:22 mark. See how the codes are organized into domains and sub-domains? Do the same for the codes that your group generated during the in-class activity in our most recent meeting. Come to class with a similar table of the codes (related to your team’s research question), ready to share.

Task 3: Look at the classification of codes that you made in the previous task. What are the main themes that emerge from these codes (choose at least two themes)?

Task 4: For each of the themes you identified, pull out one quote from your interview transcript(s) that could be used as an example for that theme. Make sure that each quote is representative of what everyone said in their interview responses related to that theme, rather than a single “cherry-picked” example. Copy each quote into your assignment document.

You have reached the end of this week’s qualitative homework assignment.

4.1.3 Quantitative Assignment #3

Please do this assignment on paper or on the computer—whichever you prefer—by the start of our next class together. Please turn it in to the appropriate D2L dropbox and have a copy with you in class (either physical paper or electronic are both fine). The portions in Excel will of course need to be done on the computer.

You might need to refer to materials from other parts of this e-book to complete this assignment. Remember that Control + F (for Windows/Linux users) or Command + F (for Mac users) is your friend.

4.1.3.1 Review

Please review the recent Quantitative Activity during which we learned about linear regression residuals. That section contains everything we went through together in class and you will need to modify the procedure that you used there in order to complete the rest of this assignment.

4.1.3.2 More Linear Regression Residuals

In Quantitative Assignment #2 and in class recently, we looked at the fitness dataset. Now we’re going to look at an updated version of this dataset. Note that the data below are different than previous versions of the fitness dataset.

Name	WeeklyWeightliftHours	WeightLiftedKG	Female
Person A	3	20	0
Person B	4	30	1
Person C	4	21	0
Person D	2	25	1
Person E	6	40	1
Person F	5	30	0

You should be able to copy and paste the data above into Excel on your computer.

We can also visualize this data in a plot, like before, except now we have different colors for each gender:

Note that you could—if you want—reproduce the chart above in Excel.

Task 5: Run a linear regression in Excel in which WeightLiftedKG is the dependent variable and both WeeklyWeightliftHours and Female are independent variables. Keep in mind that this tutorial, which you looked at before, explains how to run a model with multiple independent variables.

Task 6: Write down the equation for this regression.²⁴ The coefficients section of the same tutorial shows how to do this.

Task 7: What is the R-squared for this regression?

Task 8: Calculate the predicted values of the regression. You can do this using Excel if you know how or you can do it by hand.

Task 9: Calculate the residuals of the regression using Excel.

Task 10: According to our model, what is the predicted weight lifted by a female who weightlifts for seven hours per week?

Task 11: According to our model, what is the predicted weight lifted by a male who weightlifts for seven hours per week?

Task 12: Compare the last two answers (the female versus the male who weightlift for seven hours each). What is the difference? Is that difference equal to a number you see in your outputted regression table in Excel?²⁵ Which number?

Task 13: What blanket prediction is this regression model making about the weightlifting capabilities of women versus men, when holding constant the number of hours weightlifted per week, in this sample of six people?

Task 14: What does this regression model predict is the relationship between hours of weightlifting each week and kilograms of weight lifted, in this sample of six people?

Task 15: Figure out the correlation of the predicted values and the actual values of the dependent variable. There are many ways to do this and any way is fine!

Task 16: What is the square of the correlation that you just calculated? Compare it to the R-squared output from the regression. What did you find?

Task 17: What is the relationship between R-squared, predicted values, and actual values?

Task 18: Imagine that this fitness dataset with six observations is a sample of six people in Boston. Based on the results of the regression you just ran, what can you say (if anything) about the relationship between weekly hours spent weightlifting, gender, and weightlifting capability among all of the people of Boston (which is the population from which the six-person sample was drawn). Be sure to look at the confidence intervals in your regression output, like we did together recently in class!

4.1.3.3 Exploring New Data

Now it’s time to look at some data that is related to genetic counseling. The data is in an Excel file called “GC-Data1.xlsx” that you can find in D2L in Course Materials -> Content -> Week 9- Nov 4- Methodology 3 -> GC-Data1.

Task 19: Look at the second sheet in the Excel file, called “Field definitions”. This is the codebook or data dictionary for this survey data, meaning that it describes all of the variables so that the researcher (you) can use the data easily.

Task 20: Each observation in this data is a person who has been surveyed/tested. Identify a research question (or two) that would be reasonable to investigate with this data using some kind of regression analysis.

Task 21: Which of the variables in the dataset will be the dependent variable (the outcome of interest) when you investigate your research question? What type of variable is this? Numeric? Categorical? Something else?

Task 22: Identify one or more independent variables in the data that will be included in your analysis. Remember that some of these could be key independent variables that you are most interested in and others could just be control variables that you suspect may be associated with the outcome of interest.

The rest of this part of the assignment is all about understanding this new dataset better and gathering as many descriptive statistics and visualizations as possible that will be helpful as we begin our quantitative analysis.

Task 23: How many observations are in the dataset?

Task 24: How many rows are in the dataset?

Task 25: How many variables are in the dataset?

Task 26: How many columns are in the dataset?

Task 27: Produce descriptive statistics for each of the variables in this dataset that you will use in your analysis. This should be about 3–5 variables. This guide may be helpful.

Task 28: For each of the variables in this dataset that you will use in your analysis,²⁶ produce a meaningful visualization (one per variable) that helps us better understand the distribution of this variable across the observations in the dataset.

Keep in mind the following resources, if needed:

Task 29: Write out the regression equation that you will end up with once you run a regression to answer your research question. Leave the coefficients blank in the equation, because you don’t know those yet (the computer will tell you when you run the regression; that’s the whole reason we will do that!).

We will run the actual regression later, not now. The purpose of this assignment was just to open the data in Excel, think about our quantitative research design, and calculate descriptive statistics.

4.1.3.4 Brief Introduction to Logistic Regression

In the OLS linear regression that we have been working with so far, the dependent variable must be a continuous, numeric variable. Examples of acceptable dependent variables for linear regression include height, gas mileage, weight lifted, BMI, and other measured constructs (variables) that are measured in numbers and have a potentially large range. Basically, it’s any variable that is measured as a number and where fractions of whole numbers are still meaningful (for example, someone who can lift 20.5 kilograms can lift more weight than someone who can lift 20 kilograms, and that half-kilogram difference is still meaningful to know about).

In the most common and basic type of logistic regression, the dependent variable is always a variable with just two categories such as 1 or 0, yes or no, male or female, positive for COVID-19 or negative for COVID-19. This is what we call a binary dependent variable or a binary outcome of interest.

Please watch the following video which explains the basics of logistic regression:

StatQuest: Logistic Regression. Click here

Task 30: Write down one or more questions you have about logistic regression.

Task 31: Write an example of a research question that could be studied using logistic regression. Explain the population of interest, what the dependent variable would be and what the key independent variable(s) would be. Be ready to share this in class. Your example can be related to genetic counseling or not. It could also come from any of the datasets we have used together before, if you want.

Task 32: Explain why you could not use OLS linear regression to answer the example research question that you stated in the previous task.

Additional optional resources about logistic regression are below. It is not necessary to look at these:

StatQuest: Logistic Regression Details Pt1: Coefficients. Click here
Explaining Logistic Regression Results to Non-Statistical Audiences. Click here

We are unlikely to have time to run logistic regression models during this course. However, it is important that you have an approximate sense for how to interpret results from logistic regression models and also that you know that it is an analytic option for you in the future if you have a quantitative research design with a binary outcome of interest.

4.1.3.5 Pivot Tables in Excel

In class, we will make pivot tables in Excel to help us understand our data better. Please watch the video below, so that you are familiar with the procedure already when you come to class.

The video above can also be watched externally on YouTube at https://youtu.be/4bc88LPIvGM.

You have reached the end of this week’s quantitative homework assignment.

4.2 In class

4.2.1 Schedule

November 4 2021

1:00 p.m. – Brief introduction
1:05 p.m. – Quantitative activity in groups of 2
1:50 p.m. – Qualitative activity. Anshul will interrupt each team for 15–20 minutes to review the quantitative activity (Anshul will visit each team for 15–20 minutes, one after the other).
2:50 p.m. – End of class

4.2.2 Quantitative Activity

4.2.2.1 Part 1

All students should do all parts of this activity on their own computers, but taking help from each other in their groups.

In this part of the activity, you will mostly generate and interpret descriptive statistics in Excel.

Task 33: Write a research question that could be answered using a) the GC-Data1 dataset that you looked at for your assignment due today and b) binary logistic regression (the version of logistic regression you learned about in your assignment, which requires a dependent variable that can take on two values). Tell Anshul your research question and any required modifications to variables, as soon as you decide them.

Note that it is fine for the question you write to require the recoding of a variable that currently already exists in the dataset. Here is an example of what I mean (this example is unrelated to the GC-Data1 data): Imagine you have survey data in which your dependent variable (DV) tells you whether people like to watch TV. Their answers are either yes, no, or maybe. Currently, your DV has three possible values. But we can recode the DV such that it has only two values and then we can use binary logistic regression.

Here is a visual example of this recoding process:

Task 34: In the assignment due today, you were asked to write down any questions you have about logistic regression. Discuss these questions with each other. If you are not able to answer any of these questions among yourselves, ask any unresolved questions to Anshul.

Task 35: Create at least two basic pivot tables that are useful to help understand the information in GC-Data1. Pivot tables are another way to describe your data that we have not yet discussed.

The following tutorial might help you do this, in addition to the video you watched in preparation for today.

Pivot Tables. Easy Excel. Click here.

Task 36: Which types of variables are useful to examine using pivot tables and which types of variables are not (or much less) useful to examine using pivot tables?

Task 37: Create at least two two-dimensional pivot tables—also called two-way tables, cross tables, or contingency tables—that are useful to help understand the information in GC-Data1. If you get a two-way pivot table that is too large to reasonably read and interpret, then a different type of descriptive tool is likely best for the two variables you selected. For this activity, choose a different variable(s), such that the pivot table you get is not too large (does not have too many rows and/or columns) and is fairly quick to read and interpret.

Task 38: Pick one of the two two-way pivot tables you made above and interpret every number that you see in the table. Write down these interpretations and ask Anshul to check them if you’re not sure about anything.

4.2.2.2 Part 2

In this part of this activity, you will practice interpreting quantitative results from published genetic counseling research. Please spend as little time as necessary reading parts of each study to answer the questions below. Please DO NOT read parts of the study that you do not need to read to answer the questions.

Open this class session’s D2L content module when you start this part of the activity.

For the first few questions, refer to the following study:

Zakas AL, Leifeste C, Dudley B, Karloski E, Afonso S, Grubs RE, Shaffer JR, Durst AL, Parkinson MD, Brand R. The impact of genetic counseling on patient engagement in a specialty cancer clinic. Journal of genetic counseling. 2019 Oct;28(5):974-81.

Task 39: See Table 2 on p. 977 of Zakas et al. There is one row for the Informed choice outcome, which we will focus on now. Just for the Informed choice outcome, what does each column mean? You can skip Cohen’s d if you’re not sure, and we can discuss later (please remember to ask).

Task 40: Table 2 shows the result of which type of statistical test or model?

Task 41: In just 1–2 sentences, how would you report the finding related to Informed choice from the Zakas et al study? Make sure your response includes specific numbers from the Informed choice row of Table 2. Make sure your response addresses predicted results for both sample and population.

Task 42: Rewrite the sentence you just wrote, but now pretending that the p-value for Informed choice in Table 2 is 0.45. a

For the next few questions, refer to the following study:

Sussner KM, Thompson HS, Valdimarsdottir HB, Redd WH, Jandorf L. Acculturation and familiarity with, attitudes towards and beliefs about genetic testing for cancer risk within Latinas in East Harlem, New York City. Journal of genetic counseling. 2009 Feb 1;18(1):60.

The following questions are only about Table 3 on p. 67 of Sussner et al.

Task 43: In Table 3, what is the dependent variable?

Task 44: In Table 3, what is the outcome of interest?

Task 45: Table 3 presents the results to what type of statistical test or model?

Task 46: What is the $R^2$ statistic in Table 3? What does this mean? Answer in just one sentence.

Task 47: What is the sample size in Table 3?

Task 48: What result does Table 3 tell us regarding acculturation level. Your 1–2 sentence answer should address both sample and population.

Task 49: What result does Table 3 tell us regarding education? Your 1–2 sentence answer should address both sample and population.

The rest of this quantitative activity is optional (not required) to complete. Let’s have a look at the clock at this point in the class and decide how much more time to spend on this activity.

For the final few questions, refer to the following study:

Riesgraf RJ, Veach PM, MacFarlane IM, LeRoy BS. Perceptions and attitudes about genetic counseling among residents of a midwestern rural area. Journal of Genetic Counseling. 2015 Aug;24(4):565-79.

The following questions are only about Table 5 on p. 573 of Riesgraf et al.

Task 50: In Table 5, what is the dependent variable? How is it measured? What are the possible values that the dependent variable can have?

Task 51: What does Table 5 tell us about GC being confidential? You can refer to the Final Model portion of the table to answer this. Your 1–2 sentence answer should address both sample and population.

Task 52: What does Table 5 tell us about Age? You can refer to the Final Model portion of the table to answer this. Your 1–2 sentence answer should address both sample and population.

As soon as you complete the tasks above, inform Anshul²⁷ and then switch to today’s qualitative activity.

4.2.3 Qualitative Activity

Please work on this activity in your qualitative research groups. Any work that you do not complete in class should be finished as homework and submitted to Anshul by email by noon on Monday, November 15 2021.

During the next (final) class that you have with me (Anshul), each team will present on their practice research projects. Here is what you need to know about your presentation:

10 minutes to present, followed by 5 minutes for questions and answers.
Everyone in the group must speak during the presentation.
Refer to examples of other qualitative work to decide how/what to present.
Share your slides with me by Monday, November 15 2021 at noon. Send an email to akumar@mghihp.edu with either a) the slides in an attachment or b) a link to a shared document in which viewing and commenting are enabled. Copy all team members in this e-mail. I will then reply with feedback within 24 hours (follow up with me if I don’t). You can send your slides earlier than this date if you want, in which case I will try to send feedback sooner as well (again, remind me if I don’t reply within 1 day).

Feel free to ask lots of questions as you prepare!

Task 54: Take five minutes to skim the abstract as well as all tables and figures of the article below. This is another example of how qualitative data can be analyzed and presented:

Aliu, O., Corlew, S. D., Heisler, M. E., Pannucci, C. J., & Chung, K. C. (2014). Building Surgical Capacity in Low Resource Countries: A Qualitative Analysis of Task Shifting from Surgeon Volunteers’ Perspectives. Annals of plastic surgery, 72(1), 108. DOI link. Full text link.

Keep in mind that your most recent at-home qualitative assignment included examples of presentations of qualitative projects.

Task 55: Finish any coding and categorization of your data that you have not yet completed. You might want to share your most recent qualitative assignment answers with each other at this time.

Task 56: Identify your main findings and conclusions and how you plan to present them. Make sure you include representative²⁸ quotes from your respondents.

Task 57: Specify the capabilities and limitations of your study. Plan to very briefly (30 seconds) include this in your final presentation.

Task 58: Prepare the slides and script for your final presentation, which you will present in class in the near future. Share the slides with Anshul by noon on Monday, November 15 2021 so that you can receive some feedback. The script does not necessarily need to be written down, but I recommend that you at least have some bullet points prepared for each slide and that you practice the entire presentation at least once. Any extra time you have in class or between classes could be a good opportunity to practice the presentation; you can go to a different room to do this if you would like.

It will be longer than the one we did in class, since this has two (rather than just one) independent variables.↩︎
Hint: yes.↩︎
Not all of them.↩︎
Come back to the main room at this time, if you are participating in a virtual setting and were working in a breakout room.↩︎
This means quotes that express viewpoints you heard repeatedly during your interviews. If you wish to present responses that were not expressed repeatedly, that is also okay, but make sure that you point out to your audience that this is the case and that you plan to investigate further when you do more than just a few interviews.↩︎