Chapter 2 Variable Classes
So far, you’ve now encountered character, numeric and logical data. It is really important that you remember/know what kind of information each variable stores (and it is essential) that R
remembers, because otherwise you could run into some problems.
For example, lets say you create the following variables:
x <- 1
y <- 2
Given that we have assigned numbers, lets check and see that they are numeric:
is.numeric(x)
## [1] TRUE
is.numeric(y)
## [1] TRUE
Great, that means that we could proceed with simple sums, eg multiplication. However, if they contained character data, R
would provide you with an error:
x <- "blue"
y <- "yellow"
x*y
## Error in x * y: non-numeric argument to binary operator
Yes, R
is smart enough to know that you can’t multiply colours. It knows because you’ve used the quotation marks to indicate that the variable contains text. This might seem unhelpful, but it is actually quite useful, especially when working with data. For example, without quotation marks, R
would treat 10 as the number ten, and would allow you to do sums with it. With the quotation marks, “5”, it knows that it is text.
Above, we checked to specifically see whether our x
and y
variables were stored as numeric variables. But what if you can’t remember what you should be checking for? You could use the class( )
and mode( )
functions instead. The class( )
of the variable tells you the classification, and mode( )
relates to the format of the information. The former is the most useful in most cases.
x <- "hello"
class(x)
## [1] "character"
mode(x)
## [1] "character"
y <- TRUE
class(y)
## [1] "logical"
mode(y)
## [1] "logical"
z <- 10
class(z)
## [1] "numeric"
mode(z)
## [1] "numeric"
2.1 Factors
Let’s get into some more relevant examples for statistics. Although we have only referred to ‘numeric’ data so far, we commonly make the distinctions between nominal, ordinal, interval, and ratio. The numeric variable in R
is generally fine for ratio scale data, interval, and ordinal, but what about nominal?
Imagine that we had conducted a study with different treatment conditions. Within our study, all twelve participants completed the same task, but each of the three groups were given different instructions. Lets first create a variable that tracks which group people were in:
group <- c(1,1,1,1,2,2,2,2,3,3,3,3)
Now, it wouldn’t make sense to add two to group 1, group 2, and group 3, and we know that its not possible since they are distinct groups, but lets try anyway:
group + 2
## [1] 3 3 3 3 4 4 4 4 5 5 5 5
R has now created groups 4 and 5, which don’t exist. But we allowed it to do so, as the values are currently just ordinary numbers. We need to tell R to treat group as a factor. We can do this using the as.factor
function.
group <- as.factor(group)
group
## [1] 1 1 1 1 2 2 2 2 3 3 3 3
## Levels: 1 2 3
This output is a little different from the first lot, but lets check that it is now a factor:
is.factor(group)
## [1] TRUE
class(group)
## [1] "factor"
Now, lets try to add 2 to the group again to see what happens.
group + 2
## Warning in Ops.factor(group, 2): '+' not meaningful for factors
## [1] NA NA NA NA NA NA NA NA NA NA NA NA
Great! Now R
knows that were the ones being stupid! But what if we wanted to assign meaningful labels to the different levels
of the factor? Say, for example, we had low, high, and control conditions? We can do it like this:
levels(group) <- c("low", "high", "control")
print(group)
## [1] low low low low high high high high
## [9] control control control control
## Levels: low high control
Factors are extremely helpful, and they are the main way to represent nominal scales. It is really important that you label them with meaningful names, as they can help when interpreting output.
There are lots of other ways that you can assign labels
to your levels
.
2.1.1 Exercise
Create a nominal variable called sex
with 2 groups - male and female with 5 and 7 individuals in each group respectively. Make sure to level
and label
your variable appropriately!
sex <- c(1,1,1,1,1,2,2,2,2,2,2,2)
sex <- as.factor(sex)
levels(sex) <- c("male", "female")
print(sex)
## [1] male male male male male female female female female female
## [11] female female
## Levels: male female