R
R
Learning goals: By the end of this example mini chapter you should be able to
As we build and evaluate statistical models, a key consideration is the type of data and the process that generates these data. Variables are things which differ among individuals (or sampling units) of our study. So, for example, height, or eye color, or the type of fertilizer applied to a site, or the number of insect species per hectare are all variables.
We often care to distinguish between explanatory variables, which we think underlie or are associated with the biological process of interest, from response variables, the outcome we aim to understand. This distinction helps us build and consider our statistical model and relate the results to our biological motivation.
The difference between an explanatory and response variable often depends on the motivation and/or study design. For example if we where interested to know if fertilizer type had an (?indirect?) impact on insect diversity, the type of fertilizer would be the explanatory variable and the number of insect species per hectare would be the response variable.
Data can come in different flavors. It is important to understand these, as they should direct our model building and data summaries, interpretation and data visualization.
Numeric variables are quantitative and have magnitude, and come in a few sub-flavors. As we will see soon, these guide our modeling approaches:
Not all numbers are numeric. For example, gene ID is a number but it is an arbitrary marker and is not quantitative.
Categorical variables are qualitative, and include,
Nominal variables which cannot be ordered and have names – like sample ID, species, hair color etc…
Binary variables are special types of nominal variables, which have only two options (or for which we only consider two options. Alive/dead, pass/fail, on/off are classic binary variables).
Ordinal variables can be ordered, but do not correspond to a magnitude. For example, bronze, silver and gold medals in the Olympics are ranked from best to worst, but first isn’t some reliable distance away from second or third etc… .
After completing this quiz (and ensuring you get everything right), fill out the quiz on canvas as today’s class Quiz.
Explanatory variables are variables we think underlie or are associated with the biological process of interest.
Response variables are the outcome we aim to understand.