To build statistical models, we must know the type of data we have. Variables are things which differ among individuals (or sampling units) of our study. So, for example, height, or eye color, or the type of fertilizer applied to a site, or the number of insect species per hectare are all variables.

Explanatory and Response variables

We often care to distinguish between explanatory variables, which we think underlie or are associated with the biological process of interest, from response variables, the outcome we aim to understand. This distinction helps us build and consider our statistical model and relate the results to our biological motivation.

The difference between an explanatory and response variable often depends on the motivation and/or study design. For example if we where interested to know if fertilizer type had an (?indirect?) impact on insect diversity, the type of fertilizer would be the explanatory variable and the number of insect species per hectare would be the response variable.

Types of Data

Data can come in different flavors. It is important to understand these, as they should direct our model building and data summaries, interpretation and data visualization.

Numeric variables.

Numeric variables are quantitative and have magnitude, and come in a few sub-flavors. As we will see soon, these guide our modeling approaches:

Discrete variables come in chunks. For example the number of individuals is an integer, we don’t have 1/2 people.

Continuous variables: can take any value within some reasonable range. For example, height, weight, temperature, etc. are classic continuous variables. Some variables are trickier – for example, age is continuous, but we often analyze it as if it’s discrete. In practice, these tricky cases rarely present a serious problem for our analyses (except in the rare cases in which they do).

Not all numbers are numeric. For example, gene ID is a number but it is an arbitrary marker and is not quantitative.

Categorical variables.

Categorical variables are qualitative, and include, nominal, binary, and ordinal variables.

Nominal variables cannot be ordered and have names – like sample ID, species, hair color etc…

Binary variables are a nominal variable with only two options (or for which we only consider two options. Alive/dead, pass/fail, on/off are classic binary variables).

Ordinal variables can be ordered, but do not correspond to a magnitude. For example, bronze, silver and gold medals in the Olympics are ranked from best to worst, but first isn’t some reliable distance away from second or third etc… .