21.8 Types of Validity in Research
Validity in research includes:
Measurement Validity (e.g., construct, content, criterion, face validity)
By examining these, you can ensure that your study’s measurements are accurate, your findings are reliably causal, and your conclusions generalize to broader contexts.
21.8.1 Measurement Validity
Measurement validity pertains to whether the instrument or method you use truly measures what it’s intended to measure. Within this umbrella, there are several sub-types:
21.8.1.1 Face Validity
Definition: The extent to which a measurement or test appears to measure what it is supposed to measure, at face value (i.e., does it “look” right to experts or users?).
Importance: While often considered a less rigorous form of validity, it’s useful for ensuring the test or instrument is intuitively acceptable to stakeholders, participants, or experts in the field.
Example: A questionnaire measuring “anxiety” that has questions about nervousness, worries, and stress has good face validity because it obviously seems to address anxiety.
21.8.1.2 Content Validity
Definition: The extent to which a test or measurement covers all relevant facets of the construct it aims to measure.
Importance: Especially critical in fields like education or psychological testing, where you want to ensure the entire domain of a subject/construct is properly sampled.
Example: A math test that includes questions on algebra, geometry, and calculus might have high content validity for a comprehensive math skill assessment. If it only tested algebra, the content validity would be low.
21.8.2 Construct Validity
Definition: The degree to which a test or measurement tool accurately represents the theoretical construct it intends to measure (e.g., intelligence, motivation, self-esteem).
Types of Evidence:
Convergent Validity: Demonstrated when measures that are supposed to be related (theoretically) are observed to correlate.
Discriminant (Divergent) Validity: Demonstrated when measures that are supposed to be unrelated theoretically do not correlate.
Example: A new questionnaire on “job satisfaction” should correlate with other established job satisfaction questionnaires (convergent validity) but should not correlate strongly with unrelated constructs like “physical health” (discriminant validity).
21.8.3 Criterion Validity
Definition: The extent to which the measurement predicts or correlates with an outcome criterion. In other words, do scores on the measure relate to an external standard or “criterion”?
Types:
Predictive Validity: The measure predicts a future outcome (e.g., an entrance exam predicting college success).
Concurrent Validity: The measure correlates with an existing, accepted measure taken at the same time (e.g., a new depression scale compared with a gold-standard clinical interview).
Example: A new test of driving skills has high criterion validity if people who score highly perform better on actual road tests (predictive validity).
21.8.4 Internal Validity
Internal validity refers to the extent to which a study can establish a cause-and-effect relationship. High internal validity means you can be confident that the observed effects are due to the treatment or intervention itself and not due to confounding factors or alternative explanations. This is the validity that economists and applied scientists largely care about.
21.8.4.1 Major Threats to Internal Validity
Selection Bias: Systematic differences between groups that exist before the treatment is applied.
History Effects: External events occurring during the study can affect outcomes (e.g., economic downturn during a job-training study).
Maturation: Participants might change over time simply due to aging, learning, fatigue, etc., independent of the treatment.
Testing Effects: Taking a test more than once can influence participants’ responses (practice effect).
Instrumentation: Changes in the measurement instrument or the observers can lead to inconsistencies in data collection.
Regression to the Mean: Extreme pre-test scores tend to move closer to the average on subsequent tests.
Attrition (Mortality): Participants dropping out of the study in ways that are systematically related to the treatment or outcomes.
21.8.4.2 Strategies to Improve Internal Validity
Random Assignment: Ensures that, on average, groups are equivalent on both known and unknown variables.
Control Groups: Provide a baseline for comparison to isolate the effect of the intervention.
Blinding (Single-, Double-, or Triple-blind): Reduces biases from participants, researchers, or analysts.
Standardized Procedures and Protocols: Minimizes variability in how interventions or measurements are administered.
Matching or Stratification: When randomization is not possible, matching participants on key characteristics can reduce selection bias.
Pretest-Posttest Designs: Compare participant performance before and after the intervention (though watch for testing effects).
21.8.5 External Validity
External validity addresses the generalizability of the findings beyond the specific context of the study. A study with high external validity can be applied to other populations, settings, or times. On the other hand, localness can affect external validity.
21.8.5.2 Threats to External Validity
Unrepresentative Samples: If the sample does not reflect the wider population (in demographics, culture, etc.), generalization is limited.
Artificial Research Environments: Highly controlled lab settings may not capture real-world complexities.
Treatment-Setting Interaction: The effect of the treatment might depend on the unique conditions of the setting (e.g., a particular school, hospital, or region).
Treatment-Selection Interaction: Certain characteristics of the selected participants might interact with the treatment (e.g., results from a specialized population do not apply to the general public).
21.8.5.3 Strategies to Improve External Validity
Use of Diverse and Representative Samples: Recruit participants that mirror the larger population.
Field Studies and Naturalistic Settings: Conduct experiments in real-world environments rather than artificial labs.
Replication in Multiple Contexts: Replicate the study across different settings, geographic locations, and populations.
Longitudinal Studies: Evaluate whether relationships hold over extended periods.
21.8.6 Ecological Validity
Ecological validity is often discussed as a subcategory of external validity. It specifically focuses on the realism of the study environment and tasks:
Definition: The degree to which study findings can be generalized to the real-life settings where people actually live, work, and interact.
Key Idea: Even if a lab experiment shows a particular behavior, do people behave the same way in their daily lives with everyday distractions, social pressures, and contextual factors?
21.8.6.1 Enhancing Ecological Validity
Naturalistic Observation: Conduct observations or experiments in participants’ usual environments.
Realistic Tasks: Use tasks that closely mimic real-world challenges or behaviors.
Minimal Interference: Researchers strive to reduce the artificiality of the setting, allowing participants to behave as naturally as possible.
21.8.7 Statistical Conclusion Validity
Though often discussed alongside internal validity, statistical conclusion validity focuses on whether the statistical tests used in a study are appropriate, powerful enough, and applied correctly.
21.8.7.1 Threats to Statistical Conclusion Validity
Low Statistical Power: If the sample size is too small, the study may fail to detect a real effect (Type II error).
Violations of Statistical Assumptions: Incorrect application of statistical tests can lead to spurious conclusions (e.g., using parametric tests with non-normal data without appropriate adjustments).
Fishing and Error Rate Problem: Running many statistical tests without correction increases the chance of a Type I error (finding a false positive).
Reliability of Measures: If the measurement instruments are unreliable, statistical correlations or differences may be undervalued or overstated.
21.8.7.2 Improving Statistical Conclusion Validity
Adequate Sample Size: Conduct a power analysis to determine the necessary size to detect meaningful effects.
Appropriate Statistical Techniques: Ensure your chosen analysis matches the nature of the data and research question.
Multiple Testing Corrections: Use methods like Bonferroni or false discovery rate corrections when conducting multiple comparisons.
High-Quality Measurements: Use reliable and valid measures to reduce measurement error.
21.8.8 Putting It All Together
Face Validity: Does it look like it measures what it should?
Content Validity: Does it cover all facets of the construct?
Construct Validity: Does it truly reflect the theoretical concept?
Criterion Validity: Does it correlate with or predict other relevant outcomes?
Internal Validity: Is the relationship between treatment and outcome truly causal?
External Validity: Can findings be generalized to other populations, settings, and times?
Ecological Validity: Are the findings applicable to real-world scenarios?
Statistical Conclusion Validity: Are the statistical inferences correct and robust?
Researchers typically need to strike a balance among these different validities:
A highly controlled lab study might excel in internal validity but might have limited external and ecological validity.
A broad, naturalistic field study might have stronger external or ecological validity but weaker internal validity due to less control over confounding variables.
No single study can maximize all validity types simultaneously, so replication, triangulation (using multiple methods), and transparent reporting are crucial strategies to bolster overall credibility.