EXample Mini-Chapter
To build statistical models, we must know the type of data we have. Variables are things which differ among individuals (or sampling units) of our study. So, for example, height, or eye color, or the type of fertilizer applied to a site, or the number of insect species per hectare are all variables.
We often care to distinguish between explanatory variables, which we think underlie or are associated with the biological process of interest, from response variables, the outcome we aim to understand. This distinction helps us build and consider our statistical model and relate the results to our biological motivation.
The difference between an explanatory and response variable often depends on the motivation and/or study design. For example if we where interested to know if fertilizer type had an (?indirect?) impact on insect diversity, the type of fertilizer would be the explanatory variable and the number of insect species per hectare would be the response variable.
Data can come in different flavors. It is important to understand these, as they should direct our model building and data summaries, interpretation and data visualization.
Numeric variables are quantitative and have magnitude, and come in a few sub-flavors. As we will see soon, these guide our modeling approaches:
Discrete variables come in chunks. For example the number of individuals is an integer, we don’t have 1/2 people.
Continuous variables: can take any value within some reasonable range. For example, height, weight, temperature, etc. are classic continuous variables. Some variables are trickier – for example, age is continuous, but we often analyze it as if it’s discrete. In practice, these tricky cases rarely present a serious problem for our analyses (except in the rare cases in which they do).
Categorical variables are qualitative, and include, nominal, binary, and ordinal variables.