1.3 Scalars
Note that this and the two next sections (on vectors and [rectangular tables], Sections 1.4 and 1.5) partly repeat material that was mentioned in the introductory parts on Data objects (Sections 1.2.1). However, as they address important topics and cover some new aspects (e.g., in Sections 1.3.4 to 1.3.6), they can be skimmed, but should not be skipped.
1.3.1 Defining scalars
In Section 1.2.1, we have learned that scalars are objects of length 1 and how to define objects by assigning names to values. So let’s define some scalar objects and then examine their basic properties:
First, we use assignments to define three (numeric) objects:
Evaluating these assignments creates three new objects a
, b
, and c
in our local environment (see the Environment tab of the RStudio IDE or type ls()
in the Console to verify this).
All three objects are of the numeric data type, as they were created by assigning a numeric value to a name.
Note that c
was created as the sum of the current values of a
and b
(in c <- a + b
).
To do anything with an object, we need to apply functions to them.
We also encountered some generic functions for printing an object, and for checking its shape and type.
Thus, let’s examine the content and characteristics of object c
by these functions:
c # same as print(a)
#> [1] 3
# data type:
mode(c)
#> [1] "numeric"
typeof(c)
#> [1] "double"
is.character(c)
#> [1] FALSE
is.logical(c)
#> [1] FALSE
is.numeric(c)
#> [1] TRUE
# data shape:
length(c)
#> [1] 1
# data structure:
is.vector(c)
#> [1] TRUE
is.list(c)
#> [1] FALSE
In contrast to these generic functions, not all R functions can be applied to all objects.
In fact, most functions require specific data shapes or types to work.
As the three objects created here (i.e., a
, b
, and c
) are in numeric
mode (of type double
), we can apply arithmetic functions to them:
a + 2 # evaluate a + 2
#> [1] 3
sum(a, 2) # evaluate the function sum() with arguments a and 2
#> [1] 3
b * b
#> [1] 4
prod(b, b)
#> [1] 4
b ^ 2
#> [1] 4
c^3
#> [1] 27
Generally, the types of functions that can be applied to an object depends on its properties. Two key object properties that we need to know for this are the shape and the type of an object. Immediately after the assignment, we usually are aware of these properties. However, when working with R, we often deal with a large number of objects and they often change in the process of computing something. Hence, it is good to know that we can always use generic functions to check their shape (e.g., length) and type:
length(c) # 1 could indicate a scalar OR the number of digits...
#> [1] 1
length(1000) # Check: also 1 (i.e., NOT number of digits)
#> [1] 1
typeof(c) # numbers are of mode "integer" or "double"
#> [1] "double"
typeof(3.14159) # decimal numbers are of type "double"
#> [1] "double"
typeof(pi) # irrational numbers as well
#> [1] "double"
# Note: Objects in "numeric" mode are typically of type "double":
typeof(1) # default is "double"
#> [1] "double"
typeof(1L) # force to be "integer"
#> [1] "integer"
What about other data types? For instance, we know that R also uses objects of type character
and logical
.
Let’s create an object of type character
and examine it by some generic functions:
d <- "word" # note the quotes (""), which could also be ('')
d # same as print(d)
#> [1] "word"
# type:
mode(d) # mode "character"
#> [1] "character"
typeof(d) # type "character"
#> [1] "character"
# shape:
length(d) # a scalar!
#> [1] 1
nchar(d) # 4 characters long
#> [1] 4
# data structure:
is.vector(d) # a vector
#> [1] TRUE
is.list(d) # not a list
#> [1] FALSE
Similarly, when defining a scalar object of type logical
, we can apply the same generic functions to it, but receive slightly different results:
e <- b > a # assign (b > a) to an object e
e # print(e)
#> [1] TRUE
# type:
mode(e) # mode "logical"
#> [1] "logical"
typeof(e) # type "logical"
#> [1] "logical"
# shape:
length(e) # a scalar
#> [1] 1
nchar(e) # 4 characters long!
#> [1] 4
# data structure:
is.vector(e) # a vector
#> [1] TRUE
is.list(e) # not a list
#> [1] FALSE
Again, depending on the shape and type of an object, different functions are appropriate for probing and processing them.
For instance, the distinction between length()
and nchar()
makes sense for objects of type character
, but may yield unexpected results when applied to objects of type logical
.
A rather annoying detail of R is the technical distinction between the type and mode of an object. For instance, objects of the types “integer” and “double” are both returned as having the mode “numeric”:
typeof(a)
#> [1] "double"
mode(a)
#> [1] "numeric"
typeof(as.integer(a))
#> [1] "integer"
mode(as.integer(a))
#> [1] "numeric"
Practice
- Further examine the scalars we have just defined (e.g., by combining them, assigning new objects, and checking their
typeof()
,length()
, ornchar()
):
Solution
Thus, our exploration shows that objects a
, b
and c
are objects of numeric mode (which can be of type “integer” or of type “double”), whereas d
is a text object (of type “character”), and e
is the result of a test that is either TRUE
or FALSE
(of mode and type “logical”).
Asking for length()
and nchar()
of logical values may yield unexpected results:
Whereas length()
provides the number of logical values (here: 1, in both cases), nchar()
converts its argument into a “character” object and then counts its number of characters (here: 4 for TRUE
, 5 for FALSE
).
These examples illustrate that “knowing a function” typically implies answering three questions:
- What _task_ does the function address and solve?
- Which type and shape of _argument(s)_ does it accept and expect?
- What type and shape of _result_ does it return?
- Study the documentations of the
typeof()
and themode()
functions to determine their similarities and differences.
Solution
The difference between an object’s “mode” and “type” are subtle and not very interesting when first learning R. Both denote some intuitive notion of type — with “mode” capturing this notion more closely, as modes are mutually exclusive and every object has exactly one mode.20
For details on the available modes and types, see 2.1 Basic types of the R Language Definition. The table there shows that only objects of “numeric” mode can have two different types and storage.modes (i.e., “integer” vs. “double”). The table also shows that there exist two more basic types (“complex” and “raw”) that we won’t need in this book.
1.3.2 Changing scalars
We learned that R creates objects by assignment and created five different scalar objects (of three different modes/types):
a <- 1 # "numeric" objects
b <- 2
c <- a + b
d <- "word" # "character" object
e <- b > a # "logical" object
Once an object exists in R, we can always wonder:
- How can we change a scalar object in R?
Actually, Section 1.2.3 already answered this question in a general fashion:
- To change an object, we simply re-create (i.e., re-assign) it with a different definition.
As this answer said nothing about the object’s shape or type, we can simply apply it to scalar objects. Thus, to change an existing scalar object, we need to re-assign it — which is the same as re-creating it:
# Check values (defined above):
a
#> [1] 1
b
#> [1] 2
a/b
#> [1] 0.5
a <- 100 # changes a
a # a has changed
#> [1] 100
a/b # a/b changes when a has been changed
#> [1] 50
b <- 500 # changes b
b # b has changed
#> [1] 500
a/b # a/b changes when b has been changed
#> [1] 0.2
d # (assigned above)
#> [1] "word"
d <- "weird" # changes d
d # d has changed
#> [1] "weird"
A related question (also answered in Section 1.2.3) is:
- What happens to an existing scalar object when we re-assign data of a different type to it?
In principle, assigning data of a different type to an object could either cause an error or change the object’s data type.
To see what happens, let’s simply assign a variety of data types to object d
:
d # a "character" object
#> [1] "weird"
is.character(d)
#> [1] TRUE
d <- 3
d
#> [1] 3
is.numeric(d)
#> [1] TRUE
d <- d > 4
d
#> [1] FALSE
is.logical(d)
#> [1] TRUE
d <- "wow"
d
#> [1] "wow"
This example shows that re-assigning an object really works just like creating a new object. Whatever was assigned to the object prior to an assignment is forgotten and lost at the moment of a new assignment.21
The fact that objects change when new contents are assigned to them also implies that the order of evaluations matters:
The same object (e.g., a scalar d
or an expression a/b
) can have different contents at different locations and at different times. (Note that the line numbers to the left of your editor window mark locations and that R scripts are typically evaluated in a top-down fashion.)
1.3.3 Applying functions to scalars
We have applied some simple functions to data arguments above, but not all functions can be applied to all data. Importantly, most functions require specific types of arguments to work (i.e., the types of the arguments must match the required argument types of the function). When viewing this requirement from the perspective of existing objects, the type of an object determines which functions can be applied to it:
# Numeric objects:
a
#> [1] 100
typeof(a) # a generic function (working with all object types)
#> [1] "double"
length(a) # a scalar
#> [1] 1
a + b
#> [1] 600
sum(a, b) # an arithmetic function (requiring numeric object types)
#> [1] 600
# Character objects:
d
#> [1] "wow"
typeof(d)
#> [1] "character"
length(d) # a scalar
#> [1] 1
nchar(d) # the "length" of a character object
#> [1] 3
# Logical objects:
e
#> [1] TRUE
typeof(e)
#> [1] "logical"
!e # negation (reverses logical value)
#> [1] FALSE
!!e
#> [1] TRUE
isTRUE(e) # tests a logcial expression
#> [1] TRUE
isTRUE(!e)
#> [1] FALSE
e == !!e # tests equality
#> [1] TRUE
In case of a mismatch between function and object types, an error may occur. For instance, arithmetic functions typically require numeric data and yield errors when applied to text (i.e., character data).
1.3.4 Arithmetic functions
Some of the most common functions apply primarily to numeric objects and create new numeric objects. Examples of so-called arithmetic functions (or operators) include the following:
# Arithmetic functions:
+ 2 # keeping sign
#> [1] 2
- 3 # reversing sign
#> [1] -3
1 + 2 # addition
#> [1] 3
3 - 1 # subtraction
#> [1] 2
2 * 3 # multiplication
#> [1] 6
5 / 2 # division
#> [1] 2.5
2^3 # exponentiation
#> [1] 8
5 %/% 2 # integer division
#> [1] 2
5 %% 2 # remainder of integer division (x mod y)
#> [1] 1
When an arithmetic expression contains more than one operator, the issue of operator precedence arises. Fortunately, R uses the same precedence rules as we have learned in school — the so-called “BEDMAS” order:
- Brackets
()
, - Exponents
^
, - Division
/
and Multiplication*
, - Addition
+
and Subtraction-
# Operator precedence:
1 / 2 * 3 # left to right
#> [1] 1.5
1 + 2 * 3 # precedence: */ before +-
#> [1] 7
(1 + 2) * 3 # changing order by parentheses
#> [1] 9
2^1/2 == 1
#> [1] TRUE
2^(1/2) == sqrt(2)
#> [1] TRUE
Calling ?Syntax
provides a longer list of operator precedence. However, using parentheses to structure longer (arithmetic or logical) expressions increases transparency and is recommended.
All arithmetic operators can not only be used with numbers (or data objects for which is.numeric
is TRUE), but also with scalar objects that are assigned to numbers:
x <- 5
y <- 2
+ x # keeping sign
#> [1] 5
- y # reversing sign
#> [1] -2
x + y # addition
#> [1] 7
x - y # subtraction
#> [1] 3
x * y # multiplication
#> [1] 10
x / y # division
#> [1] 2.5
x ^ y # exponentiation
#> [1] 25
x %/% y # integer division
#> [1] 2
x %% y # remainder of integer division (x mod y)
#> [1] 1
Actually, arithmetic operators also work with (numeric) vectors (see Exercise 5 in Section 1.8.5) and some arithmetic functions also work with non-numeric (e.g., logical) objects.
1.3.5 Numeric comparisons
By applying logical operators to numeric objects, we get logical values (i.e., scalars of type logical
that are either TRUE
or FALSE
). For instance, each of the following comparisons of numeric values yields a logical object (i.e., either TRUE
or FALSE
) as its result:
2 > 1 # larger than
#> [1] TRUE
2 >= 2 # larger than or equal to
#> [1] TRUE
2 < 1 # smaller than
#> [1] FALSE
2 <= 1 # smaller than or equal to
#> [1] FALSE
The operator ==
tests for the equality of objects, whereas !=
tests for their inequality (or non-equality):
A common typo of R novices and inattentive R developers is to use =
instead of ==
.
As =
can be used as an alternative for the assignment operator <-
, this often yields unexpected results or “assignment” errors.
Additionally, the ==
operator often yields unexpected results when checking the equality of two numbers.
As computers store (real) numbers as approximations, x == y
often evaluates to FALSE
even we mathematically know that x
and y
should be equal. For example:
x <- sqrt(2)
x^2 == 2 # should be TRUE, but:
#> [1] FALSE
# Reason:
x^2 - 2 # tiny numeric difference
#> [1] 4.440892e-16
When checking for the equality of numbers, we need to use functions that allow for minimal tolerances due to the way in which computer represent so-called floating point numbers. One such function is the all.equal()
function:
1.3.6 Logical operators
Beyond numeric comparisons, the logical operators &
(and), |
(or), and !
(not) allow for logical expressions and yield logical objects (i.e., TRUE
or FALSE
) as their result:
## Logical operators:
(2 > 1) & (1 > 2) # &: logical AND
#> [1] FALSE
(2 < 1) | (1 < 2) # |: logical OR
#> [1] TRUE
(1 < 1) | !(1 < 1) # !: logical negation
#> [1] TRUE
Practice
- Assignment vs. testing for numeric equality:
- Predict, evaluate, and explain the results of the following three expressions:
- How can we change the third expression to obtain the (mathematically correct) result
TRUE
?
- Logical expressions:
- Evaluate
?base::Logic
to see and read the help page on logical operators.22
- Look up De Morgan’s Laws (e.g., on Wikipedia) and express them in R.
Hint: Verify their truth by evaluating them for two objects A and B that are assigned to arbitrary truth values.
The potential for confusion is aggravated by the fact that objects also belong to some “class”, but an object can have multiple classes assigned to it, which can easily be changed by the user.↩︎
When programming in R, this statement will be qualified by the notion of variable scope. Variables with identical names can co-exist at multiple levels. Re-assigning an object overwrites any previous object at the current level.↩︎
The Examples of
?base::Logic
illustrate a somewhat problematic regularity of R: They are informative for experienced users, but are often difficult to understand for beginners. Nevertheless, copying and trying to understand examples is a good way to learn more about a particular topic.↩︎