0.1 Example mini-chapter: Types of Variables and Data

Learning goals: By the end of this example mini chapter you should be able to

  • Distinguish between explanatory and response variables.
  • Distinguish between data types.
    • Continuous vs Categorical
    • Differentiate between continuous and discrete continuous variable.
    • Differentiate between nominal and ordinal categorical variables.

As we build and evaluate statistical models, a key consideration is the type of data and the process that generates these data. Variables are things which differ among individuals (or sampling units) of our study. So, for example, height, or eye color, or the type of fertilizer applied to a site, or the number of insect species per hectare are all variables.

0.1.1 Explanatory and Response variables

We often care to distinguish between explanatory variables, which we think underlie or are associated with the biological process of interest, from response variables, the outcome we aim to understand. This distinction helps us build and consider our statistical model and relate the results to our biological motivation.

The difference between an explanatory and response variable often depends on the motivation and/or study design. For example if we where interested to know if fertilizer type had an (?indirect?) impact on insect diversity, the type of fertilizer would be the explanatory variable and the number of insect species per hectare would be the response variable.

0.1.2 Types of Data

Data can come in different flavors. It is important to understand these, as they should direct our model building and data summaries, interpretation and data visualization.

0.1.2.1 Flavors of numeric variables.

Numeric variables are quantitative and have magnitude, and come in a few sub-flavors. As we will see soon, these guide our modeling approaches:

  • Discrete variables come in chunks. For example the number of individuals is an integer, we don’t have 1/2 people.
  • Continuous variables can take any value within some reasonable range. For example, height, weight, temperature, etc. are classic continuous variables. Some variables are trickier – for example, age is continuous, but we often analyze it as if it’s discrete. In practice, these tricky cases rarely present a serious problem for our analyses (except in the rare cases in which they do).

Not all numbers are numeric. For example, gene ID is a number but it is an arbitrary marker and is not quantitative.

0.1.2.2 Flavors of categorical variables.

Categorical variables are qualitative, and include,

  • Nominal variables which cannot be ordered and have names – like sample ID, species, hair color etc…

  • Binary variables are special types of nominal variables, which have only two options (or for which we only consider two options. Alive/dead, pass/fail, on/off are classic binary variables).

  • Ordinal variables can be ordered, but do not correspond to a magnitude. For example, bronze, silver and gold medals in the Olympics are ranked from best to worst, but first isn’t some reliable distance away from second or third etc… .

0.1.3 Quiz

After completing this quiz (and ensuring you get everything right), fill out the quiz on canvas as today’s class Quiz.

0.1.4 Definitions

Explanatory variables are variables we think underlie or are associated with the biological process of interest.
Response variables are the outcome we aim to understand.

Categorical variables are qualitative – they cannot be assigned a meaningful value on the number line.
Numeric variables are quantitative – they can be assigned a meaningful value on the number line.