4 Defining and Measuring Concepts

Back in chapter 2 we discussed one of the largest differences between the natural and social sciences. In the natural sciences concepts are generally universally understood. Ask a scientist in Japan and the United States what an atom is, and they’ll describe the same thing. The same thing with items like acceleration, or a photon, or light, or gravity. All of the important things they need have set definitions. The social sciences have a lot more difficulty defining exactly what we’re studying, and even if we develop a definition measuring it is more complicated. If you ask an American and a Japanese scientist what happiness is, they’ll likely have different definitions. Heck, ask two Americans in the same family and you’ll probably get two different ideas.

What is happiness? The Merriam-Webster dictionary says “a state of well-being and contentment. Wikipedia describes it as “used in the context of mental or emotional states, including positive or pleasant emotions ranging from contentment to intense joy.” Those are both circling the same ideas, but they don’t have the same precision as the definition a scientist could use for acceleration. But we all generally know what we mean by happiness, even if we can’t develop a definition that is universally shared. But let’s say we identify a definition that most people will understand. How should we measure happiness then?

Let’s say I want to do research on whether more equal societies (or countries or states or neighborhoods) are happier. How would I measure the happiness of California vs. Alabama, or in the United States as opposed to Peru? I could do a survey of people, and ask them to rate their own happiness. I could look at twitter by different localities, and rate how happy or unhappy tweets are. I could look at depression rates or how often certain drugs are prescribed in different locations. None of those are directly measuring happiness though. They’re all just approximations that might help me to understand happiness.

Defining your terms and moving from concepts to measures is one of the most dangerous points in the research process, but also in my opinion one of the most fun.

4.1 Concepts and Operationalization

We begin with concepts. We want to study resiliency, or health, or gentrification, or happiness. These are all concepts, abstract ideas or general notions that occur in the mind, in speech, or in thought. The name used to identify a concept is a “term”. For instance, the term “toughness” is a concept. Different people may have a different interpretations of the word “toughness,” but generally speaking people know what the word means when they hear it used in any context.

What we must then do is operationalize the concept. To operationalize is to identify a way we can measure the concept under study. For instance, if we want to measure a community’s health, we might measure it based on the average life span – assuming that individual’s that live longer on average are healthier. There are other operationalizations we could use for community health, such as the percentage of people that are overweight, of the infant mortality rate, or others I’m sure.

But again, these are just operationalizations of the concept, they aren’t themselves actually the concepts we want to study. Acceleration isn’t a concept. It’s mathematically defined, and so there are no questions about what it is in the sciences. We don’t have any definition of happiness, nor a formula to understand it. Thus, every research studying happiness has to define the term and justify the way they’ll operationalize it in their study.

It’s important not to get confused and assume the way we measure a concept is itself the concept. Reification is the assumption that they are the same, but that can often lead people astray. For example, have you ever seen someone on an internet message board that got into an argument that pointed to their education as proof they are smarter: “I have three advanced degrees, I must be smarter than you.” Education is a good measure of intelligence, but they aren’t the same thing. Smarter people are probably typically more educated, but that is often not true. Getting an education takes the means and opportunity to get finish school. In addition, an education generally celebrates a certain type of intelligence, but book smarts aren’t the only way one can understand intelligence.

Let’s step back. Sometimes we aren’t talking about a concept. If I’m a food scientist and I’m curious about the impact of coffee on health, coffee isn’t a concept. It’s just coffee. It’s not measuring an abstract idea, it’s just coffee. If I’m interested in who votes, voting isn’t a concept. A person goes to the ballot box and votes. That’s an act which people won’t disagree about whether happened or not.

Why do we care about voting? It’s unlikely an individual’s vote will have an impact on the election. I’m not trying to devalue voting (please go vote whenever you get the chance!), but why do we care about predictors of who votes, or falling rates of voting in the United States? Why would we care about this headline from the Pew Research Center?

I would say it’s because of the different concepts that voting is being used to operationalize, even if just implicitly. Voting might itself be a clearly defined action, but voting can also be a measurement of different concepts. Voting is an expression of political power in society, and political power is difficult to define or directly measure. Public participation is often operationalized by using the share of individuals that vote in a community because it’s an expression that people care to be engaged in making changes.

4.2 Evaluating Operationalizations

How do we know we’ve done a good job of operationalizing our concept? It isn’t enough to just say a measure approximates something. I could survey people on how many oranges they eat a week and call that health. But even if I say that is health, most people will understand that I’ve done a bad job of measuring my concept.

We have two things to keep in mind in evaluating our measures: validity and reliability. Validity tells you whether what you’re measuring is what you’re supposed to be measuring.

At its core, validity is about shared understanding. Unfortunately, there’s no great test for a measurements validity. The easiest way to make sure your measure is valid is to ask someone else. If you tell someone you’re measuring mental health by how many books they have on their shelf, they’ll probably look at you funny. That’s not a valid measure, and it won’t fit within their shared understanding of what you mean by mental health.

One objection you may get in your research is that your measure is invalid based on how you define it. If I measure intelligence by years of school completed, some reviewers might find that valid, but others might object. Year of schooling isn’t intelligence they may say, it’s a better measure of education. As such, it isn’t the measure itself that is the problem – it’s what I’m purporting that its operationalizing that is.

Reliability is another concern in thinking about the quality of our measures. Reliability is all about consistency – will you get the same response if you measure the same thing multiple times.

For instance, let’s say I survey people about how concerned they are with their health. I’m unlikely to get consistent answers from people, depending on the time of day or year. People are often more concerned about their health around the holidays, after they’ve seen family, been stuck inside because of the weather, and eaten holiday meals. If I ask them in the middle of March though I might get a different answer. With such unreliable answers and responses, I would want to make sure I’m measuring the phenomena multiple ways and multiple times to get a better idea of a person’s opinion.

If I’m concerned with measuring someone’s height, I have fewer concerns. If I ask you how tall you are today and two days later and a year from now, I’ll probably get the same exact response (maybe not over decades, as people do shrink over time)

Weight is an interesting case. Yes, people’s weights change over time. I wouldn’t expect you to weighh the same today and tomorrow, but people will generally give reliable responses, even with a little fib. If I ask people how much they weigh, they’ll probably tell me something roughly correct, just probably 5 pounds lighter. As long as everyone takes the same 5 pounds off their weight before telling, it’s still a reliable question.

I used this example in class once, and you can imagine it for yourself. I had students write down how many pens they thought they had with them in their bags. Everyone did, and then counted the number of pens. Very few people were right, but more concerning was the nature of how they were wrong. Their answers were all over the place, some people over guessed and some people underused. If I was actually concerned with how many pens people had, just asking them wouldn’t provide me any real indication of how many pens they had. It’s an unreliable question.

Those things come up often in recall. How many calories did you eat today? Unless you tracked it, you’ll take a guess and you might be high or low. Humans aren’t great at processing or retaining that information, which makes those questions unreliable.

The image of a target is often used when discussing validity and reliability. A reliable archer puts their arrows all in the same spot, and a valid archer puts them near the target. If your measure is reliable but not valid, you’ll get really consistent measurements (how tall is a person), but it won’t actually be measuring the concept you’re concerned with (near the target). A valid and reliable measure is the goal.

4.3 Summary

In this chapter we’ve described concepts and measures, and particularly the importance in establishing how you’re going to operationalize the terms in your study. You can have the best research question, the coolest data, and a really nice analysis, but if your data isn’t actually measuring what it says it is your study is going to land like a dud.