41 Reading and critiquing research

So far, you have learnt the about process of research: asking a RQ, designing a study, collecting data, describing and summarising the data, and analysing the data (confidence intervals; hypothesis tests). In this chapter, you will learn to:

  • read and critique research.

41.1 Introduction

Scientific practice requires reading the research of others. Advances in every science-based discipline build on existing research, so being able to critique the research of others is important. (A critique evaluates the research, identifying what is good, and what could be improved.) Research is usually communicated in journal articles (also called papers), or in presentations (conferences; seminars). Millions of journal articles are available (this book references many articles).

At some time during your studies, you will need to read researches articles: to understand current practices in your discipline; to know why your discipline does things as it does; to critique the evidence for current practices; and to identify open or unresolved questions in your discipline. Understanding the language of research is important for understanding these articles, even if you will not be conducting your own research.

Previous chapters have explained the tools necessary for evaluating or critiquing research and research articles. However, reading research articles can be challenging. Rather than reading thoroughly from start to finish, start by reading the Abstract (also called a Summary, or Overview), as it provides a useful overview of the whole paper without the details. Then, skim through the rest of the article (perhaps focusing on graphs and tables of results). Finally, if necessary, the paper can be read for greater detail as needed.

The six steps of the research process (Sect. 1.5) can be used as a guide to critiquing and asking questions about the research:

  1. Ask the question:
    • What research question is the paper answering?
    • Why is this important?
    • To what population do the results apply?
    • What are the units of analysis and observation?
    • Are the definitions clear and appropriate?
    • Are there important inclusion and/or exclusion criteria that apply?
  2. Design the study:
    • How is the study designed?
    • Is the study observational or experimental?
    • How many individuals are in the study?
    • How was the sample obtained, and what are the implications for external validity?
    • Is the study designed to maximize internal validity?
    • What are the design limitations?
    • Are there ethical concerns (including with funding)?
  3. Collect the data:
    • How were the data collected?
    • Could the study be approximately repeated if needed?
  4. Classify and summarise the data:
    • Is the data summary appropriate, complete and clear?
    • What do these summaries reveal about the data?
    • What do the tables and graphs reveal about the data and relationships?
  5. Analyse the data:
    • What types of confidence intervals and/or hypothesis tests were used?
    • Is the analysis appropriate, accurate, valid and clear?
    • What do the results mean?
    • Are the results statistically valid?
  6. Report the results:
    • What are the main conclusions, and how do they answer the RQ?
    • Are the conclusion consistent with the results?
    • Are the results accurate, appropriate and well-reported?
    • Are the results of practical importance?
    • Are the study limitations acknowledged, and their implications discussed?
    • What other questions have emerged?

41.2 Example: blue light and sleep

The Abstract of a study of the impact of 'blue light' emitted by electronic devices on sleep (Randjelović et al. 2023), slightly edited for clarity, appears below:

The exposure of humans to artificial light at night... with predominant blue part of the visible spectrum is strongly influencing...sleep...

We hypothesized that reducing the amount of emitted blue light from screens of mobile phones during the night will increase sleep quality in our student population.

The aim of the work was to investigate the effect of reducing blue light from smartphone screen during the night on subjective quality of sleep among students of medicine.

The target population was students of medicine aged \(20\) to \(22\) years old of both sexes. The primary outcome of the study was subjective sleep quality, assessed by the Serbian version of the Pittsburgh Sleep Quality Index (PSQI).

The mean total PSQI score before intervention was \(6.83\pm 2.73\) (bad), while after the intervention the same score was statistically significant reduced to \(3.93 \pm 1.68\) (good) with large effect size.

The study has shown that a reduction of blue light emission from LED backlight screens of mobile phones during the night leads to improved subjective quality of sleep in students...

As this is the Abstract, we expect many details to be missing (but hopefully explained in the article itself). Specifically, details of the data collection and summarises of the variables are not usually given in the Abstract.

  • Ask the RQ:
    • The population is 'students of medicine aged \(20\) to \(22\) years old'.
    • The units of analysis and observation are the same: each person student in the study. Two units of observation over unit of analysis: each person has two measurements (before and after using the unstated 'intervention').
    • Outcome: (Average) sleep quality, as measured by PSQI. We should determine what is this, and what the numbers mean. The Abstract also implies other responses of interest too.
  • Design the study:
    • The sampling method should be determined. Are those in the sample likely to be different than those not in the sample, in terms of variables in the study?
    • Any exclusion and/or inclusion criteria should be identified.
    • The sample size is not given (unusual for an Abstract).
  • Analyse the data:
    • Though not stated, a quantitative variable is being compared within individuals, so a paired \(t\)-test is the likely method of analysis.
  • Report the results:
    • Results are only given for sleep quality; what about other responses?
    • Means are given before and after, but information on the change is not given.
    • The numbers that follow the \(\pm\) are not explained: are they standard deviations, IQRs, ranges, standard errors?
    • The means are said to be 'statistically significant', which means \(P\) is small. However, we need to determine what 'small' means here.

Reading more of the article, many of the questions are answered (and sometimes, more raised).

We learn that the population comprises students of medicine from the University of Nis, Helsinki aged \(20\) to \(22\) (p. 336). In addition, inclusion criteria ('owning and daily usage of mobile phone with Android operating system and AMOLED screen') and exclusion criteria ('sleep disorders, usage of sedative drugs, psychoactive substances, usage of phone apps or glasses that reduce blue light during the night, recent stressful situations') are given (p. 336). Since only users of Android phones were included, results may not apply to users of other types of phones (e.g., iPhones). The results will only apply to these people, though may be applicable to people more generally, as there is no obvious reason why only people in such a narrowly-defined group would be impacted.

The intervention is given as the use of a 'free Android application Twilight [...] on mobile phones of participants [which] automatically decreases brightness and content of emitted blue color from the screen during the night time'.

Participants for the study were chosen 'on voluntary basis' (p. 336), so the sample was not a random sample. The study may not be externally valid, but the students in the study probably would not be very different than those who did not volunteer for the study.

Thirty students (p. 337) were used in the study, but the study actually started with \(37\) students; seven dropped out. The researchers compared the students who remained in the study with those who dropped out of the study (Table 1); they found no evidence that those who dropped out and those who stayed were different on the variables studied (i.e., the drop outs did not introduce a bias). The sample size of \(n = 30\) suggests the \(t\)-test is statistically valid. The required sample size is estimated using software (p. 336).

The response variable is the PSQI total score, which ranges from \(0\) to \(21\); the Abstract implies smaller scores are better. No control group was used (p. 337).

The study is conducted 'in complete dark room without additional light'; that is, a partially artificial environment, so the results may not be ecologically valid.

Since 'each participant was informed about the detailed plan of the study...' due to ethics requirements, (p. 336) the participants were not blinded to being in a study, nor the purpose of the study.

Since the response (PSQI) is completed by participants completing a subjective questionnaire, the placebo effect may be of concern (using objective measures is better when possible).

Participants completed the questionnaire pre-intervention, then 'used the app for one month period [and] at the end they completed PSQI once again'. This suggests that the carry-over effect may be an issue, and no random allocation was used to decide which situation was evaluated first.

The article states that results 'were presented as mean and standard deviation before and after intervention' (p. 337), so presumably the numbers in the Abstract after the \(\pm\) are standard deviations.

An excellent case-profile plot is shown of the data (Fig. 1). Other variables apart from the total PSQI were studied (such as sleep duration), and usefully summarised in Table 2. However, only scores are reported for before and after intervention, not for the changes themselves.

The method of analysis was a 't test for two dependent samples with level of significance set to \(0.05\)' (p. 337). This means 'statistical significance' refers to a \(P\)-value less than \(0.05\). The \(P\)-value for the paired \(t\)-test is given as \(p < 0.0001\) (p. 337), which is indeed 'statistically significant' using their (arbitrary) threshold.

Ethical approval was granted (p. 336). The funding was from the Ministry of Education, Science and Technological Development, Republic of Serbia (p. 341), suggesting no conflicts of interest. The data are not available, so the research is not completely reproducible.

The article lists limitations of the research explicitly (p. 341):

... lack of generalization to other population groups, the lack of control group, the very nature of questionnaire as subjective instrument, duration of intervention, difference in devices used as well as usage time, confounding by other light sources at night.

These include the acknowledgement of potential confounding variables. Strengths are also listed (p. 341):

... being specific to investigate the impact of mobile phones to sleep quality, natural setting of the intervention, pre-calculated sample size with appropriate achieved power and significant effect sizes reported (medium to large).

41.3 Chapter summary

The six steps of research can be used as a guide for critiquing research articles. Starting by reading the Abstract (or Summary) for an overview, then glean more information by carefully reading the remainder of the article.

41.4 Quick review questions

  1. True or false: The best way to read an article is thoroughly, from start to finish.
  2. True or false: The six steps of research are a useful guide for critiquing an article.
  3. True or false: Critiquing an article means to only find the problems.

41.5 Exercises

Selected answers are available in App. E.

Exercise 41.1 A research article (Duncan et al. 2018) examined the accuracy of step counts recorded on iPhones. The paper records this information about the selection of participants:

Participants were recruited through word of mouth and posters displayed around the [researcher's] university. Participants were eligible if they were ambulatory, \(\ge 18\) years of age, and owned an iPhone 6 [...] or newer model.

Although \(33\) participants were selected, the authors note some parts of the study used a smaller sample size because one subject lost their phone, while others chose to withdraw from the study. The paper notes that previous studies have been able to:

[...] demonstrate the accuracy of the iPhone pedometer function in laboratory test conditions. However, no studies have attempted to evaluate evidence [...] in the field.

  1. What is the issue that the authors raise with previous studies?
  2. Why did the authors discuss the changes in sample size for some parts of the study?
  3. How would you describe the sampling method?
  4. What would you call the information about given about the subjects needing to be ambulatory and 18 years of age or over?
  5. Among many other things, the researchers compared the mean difference between the number of step counts recorded by manually counting steps and the iPhone-recorded number of steps. What type of test would be appropriate?
  6. While walking at \(2.5\) km.h-1, the above test resulted in \(P = 0.006\). What does this mean?
  7. The sample size for the part of the study mentioned above was \(n = 32\). Do you think the test is statistically valid?

Exercise 41.2 One study of hearing loss among Iranian students (Mohammadpoorasl et al. 2018) used a non-directional study to explore the relationship between hearing loss and headphone use. The article states that:

... \(890\) students were randomly selected from five schools at QUMS (Medicine, Dentistry, Nursing and Midwifery, Public Health, and Paramedical Sciences schools) using a proportional cluster sampling method...

The participants completed a hearing test and completed a Hearing Loss Questionnaire (values are between \(17\) and \(34\): higher scores indicating more severe hearing loss).

  1. What is the population?
  2. Critique the sampling method: What is the implication for interpreting the results of the study?
  3. Some of the results are presented in Table 41.1. What statistical test do you think was used to compare the scores for males and females?
  4. What are the hypotheses being tested about 'Frequency of use'?
  5. Form an approximate \(95\)% CI for the mean hearing loss score for students who use earphones.
  6. What information is needed to be able to form an approximate \(95\)% CI for the difference between the hearing loss scores for females and males?
TABLE 41.1: The Hearing Loss Questionnaire scores for various demographic variables
Levels Sample size Mean Std. dev P-value
Female \(543\) \(19.37\) \(2.91\) \(0.009\)
Male \(302\) \(19.99\) \(3.51\)
\(0\), \(1\) times/day \(194\) \(19.20\) \(2.87\) \(0.001\)
\(2\) to \(3\) times/day \(319\) \(19.60\) \(2.66\)
More than \(3\) times/day \(278\) \(20.20\) \(3.54\)
Yes \(745\) \(19.80\) \(3.08\) \(&lt; 0.001\)
No \(100\) \(19.00\) \(1.71\)

Exercise 41.3 The Abstract from a large study is given below:

OBJECTIVE: This study aims to elucidate any existing link between energy-containing liquids, consumed in various forms within the diet, and the effect they may have on body weight or other diseases [...]

METHODS: A self-administered online survey was conducted in \(2496\) participants from different countries, in six languages (Spanish, English, Chinese, French, German and Portuguese). Questions referred to their soft drink and water consumption habits, physical exercise performed, presence or absence of certain diseases and medication.

RESULTS: There is statistically significant difference (\(p < 0.001\)) in BMI and consumption of cola per week: those who consumed \(0\)--\(3\) cans a week have a lower BMI than those who consume \(>7\) cans of cola a week [...] There is greater presence of obesity (\(p < 0.001\)), gastritis (\(p < 0.001\)), constipation (\(p < 0.001\)) and mental illness (\(p = 0.003\)) among people who drink cola soft drinks.

CONCLUSION: Removal of energy-containing beverages from our diet may be an appropriate public health message to support those interested in preventing weight gain as well as other diseases.

--- Martín et al. (2018), p. 1

Evaluate the study using the six steps of research discussed in this book.