Chapter 9 Useful tips
Following the principles detailed above is guaranteed to make you a better programmer so we hope you will stick to them. To help you along the way just that tiny bit more, here are a few additional tips.
9.1 Break it down
Maybe you find yourself struggling with a task and turn to StackOverflow for help. Maybe you manage to find a ready-made answer to your problem or maybe some helpful soul writes one just for you. And maybe it works but you don’t quite understand why.
These things happen and sometimes a line of code can appear quite obscure. Take for example:
for (i in 1:10) eval(parse(text = paste0(
"df_", i, " <- as.data.frame(matrix(rnorm(100 * ", i ,", 0, ", i, "), ncol = i))")))
When faced with a bit of code like this, it is generally a good idea to try to reverse-engineer it. Let’s give it a go.
First, we can see that this is a for
loop that repeats itself 10 times: it starts by assigning the value of 1
to the iterator object i
, then executes the code, increments i
by 1 and repeats until i == 10
.
So to look what the code inside the loop does, we need to set i
to some value (1
is a reasonable choice).
i <- 1
Now, let’s start by running the code from the innermost command outwards:
paste0("df_", i, " <- as.data.frame(matrix(rnorm(100 * ", i ,", 0, ", i, "), ncol = i))")
## [1] "df_1 <- as.data.frame(matrix(rnorm(100 * 1, 0, 1), ncol = i))"
Right, so the first command created a character string that looks like a command. Let’s break it down even further:
rnorm(100 * 1, 0, 1)
## [1] -1.047907263 0.334647380 0.435923495 -0.480989597 1.702518182
## [6] -0.370730068 1.052210386 1.108929130 0.018052376 1.279100669
## [11] -1.504444115 0.424034485 -1.204517636 -0.021164629 1.211853449
## [16] 0.332288148 1.078427383 1.025987337 0.172459867 0.795208031
## [21] 0.440109993 -0.827218657 0.596764479 0.745033312 -3.025576114
## [26] -1.762091803 0.413914646 0.845674843 -0.694720934 -0.308303527
## [31] 1.195282698 -0.911274774 0.115036852 0.539872015 2.341801712
## [36] -0.190848300 0.461041703 -0.184094516 2.208509955 1.044633826
## [41] -0.786850302 -0.054910095 0.252311968 1.319205985 -0.364450048
## [46] 1.001338090 -1.540652875 -0.669952228 0.355237452 0.245582746
## [51] 0.815361338 -1.470188139 0.922409071 1.359885235 0.047637818
## [56] -0.139715380 0.536770964 0.714087608 -2.007776828 1.353881648
## [61] 1.388877526 1.640218197 -0.962296390 -1.649831027 -0.336591424
## [66] -0.524440172 -1.461580211 1.786026822 -0.081735195 0.550506575
## [71] -0.790132175 -0.778848364 -1.538555873 0.486646149 -1.480791963
## [76] 1.274834396 -0.381704736 -1.392629198 1.840397983 1.874917568
## [81] 0.445369658 -0.485186904 1.255907351 1.095966740 -0.054362077
## [86] 0.064135598 -0.832767137 1.154166817 -0.009764447 -1.420798097
## [91] -0.666841665 -0.260189689 -1.241116990 -0.312476256 0.585133232
## [96] 0.220116142 0.321966905 -0.125602859 0.144848682 -1.150526461
OK, this is easy. The first bit generates \(100 \times i\) random numbers with a mean of zero and a standard deviation of i. Let’s move one layer out:
# printout truncated to first 10 lines
matrix(rnorm(100 * 1, 0, 1), ncol = i)
## [,1]
## [1,] 0.03877248
## [2,] -0.69695493
## [3,] 0.55118318
## [4,] -0.40594141
## [5,] -0.32389321
## [6,] -0.99335789
## [7,] -1.93083243
## [8,] -0.85998697
## [9,] 1.77928830
## [10,] 1.37858208
## [ reached getOption("max.print") -- omitted 90 rows ]
This command put those numbers into a matrix
with 100 rows and i columns. Next:
df_1 <- as.data.frame(matrix(rnorm(100 * 1, 0, 1), ncol = i))
This line converts the matrix into a data.frame
and stored it in an object called “df_i
”. Remember, i
takes values of 1-10, increasingly each time the loop is repeated.
All good thus far but why is the command a character string (in “quotes”)? What is that good for? Well, turns out that the parse()
function can take a string with a valid R
code inside and turn it to an expression
:
parse(text = paste0("df_", i, " <- as.data.frame(matrix(rnorm(100 * ",
i ,", 0, ", i, "), ncol = i))"))
## expression(df_1 <- as.data.frame(matrix(rnorm(100 * 1, 0, 1), ncol = i)))
This expression can be then evaluated using the eval()
function:
eval(parse(text = paste0(
"df_", i, " <- as.data.frame(matrix(rnorm(100 * ", i , ", 0, ", i, "), ncol = i))")))
# printout truncated
df_1
## V1
## 1 -0.2964653
## 2 -1.1495345
## 3 1.4206396
## 4 0.2983326
## 5 1.8263633
## 6 -0.8628447
## 7 1.1240684
## 8 -2.1283718
## 9 -0.2896466
## 10 -0.5050959
## [ reached 'max' / getOption("max.print") -- omitted 90 rows ]
So what the entire loop does is create 10 data frames named df_1
to df_10
, each containing 100 rows and a different number of columns (1 for df_1
, 6 for df_6
etc.) with random numbers. Moreover, each data.frame
contains random numbers with different standard deviations.
And so, just like that, with a single line of code we can create 10 (or more!) different R
objects with different properties. Cool, isn’t it? Hope this example demonstrates how, using systematic reverse-engineering, you can come to understand even a daunting-looking code with functions you haven’t seen before.
9.2 Handy functions that return logicals
Finally, here are some useful functions with which you might want to familiarise yourself. They will make cleaning your data much easier.
==
, takes avector
,matrix
, or adata.frame
and compares every element thereof to a single value. Returns alogical vector
withTRUE
for elements that are equal to the compared value andFALSE
otherwise. ComparingNA
returnsNA
.c(1:5, NA) == c(100, 2, 2, 8, 5, 9)
## [1] FALSE TRUE FALSE FALSE TRUE NA
<
, same as==
, butTRUE
is returned if element is less than the compared value.>
, same as==
, butTRUE
is returned if element is greater than the compared value.<=
, same as==
, butTRUE
is returned if element is less than or equal to the compared value. In other words, it is a negation of (complementary operation to)>
.>=
, same as==
, butTRUE
is returned if element is greater than or equal to the compared value. Negation of<
.%in%
, same as==
, but can take avector
on the right hand side. Each element of thevector
/matrix
/data.frame
to the left is compared to each element of the vector to the right. For example:c(1:5, NA) %in% c(100, 4, 2, 8)
## [1] FALSE TRUE FALSE TRUE FALSE FALSE
- all functions that begin with ‘
is
’, e.g.:is.na()
, takes avector
,matrix
, or adata.frame
and returns alogical vector
withTRUE
if given element is anNA
andFALSE
otherwise.is.numeric()
, takes any object and returnsTRUE
if it is a numeric vector andFALSE
otherwise.is.factor()
,is.matrix()
,is.data.frame()
,is.list()
, same asis.numeric()
but returnTRUE
if the object provided is afactor
,matrix
,data.frame
, orlist
, respectively.isTRUE()
, returns a singleTRUE
if the expression provided evaluates toTRUE
and a singleFALSE
otherwise. OnlyisTRUE(TRUE)
returnsTRUE
.isTRUE(FALSE)
,isTRUE(c(TRUE, TRUE))
and anything else returnsFALSE
. Works withNA
s so can be useful for combining with logical operators that returnNA
when comparing missing values. For example
NA > 4
## [1] NA
isTRUE(NA > 4)
## [1] FALSE
any()
, takes a logical vector and returnsTRUE
if any of its elements equalsTRUE
, andFALSE
otherwise, e.g.,any(1:5
> 4)
returnsTRUE
.all()
, likeany()
but returnsTRUE
only if all of the elements of the vector provided areTRUE
.all.equal()
, takes two objects and returnsTRUE
if they are identical and a vector of all discrepancies otherwise. Sensitive to attributes soall.equal(1:5, factor(1:5))
does not returnTRUE
. Good to use along withisTRUE()
!all.equal(df, df)
## [1] TRUE
all.equal(df, my_list)
## [1] "Names: 3 string mismatches" ## [2] "Attributes: < names for target but not for current >" ## [3] "Attributes: < Length mismatch: comparison on first 0 components >" ## [4] "Length mismatch: comparison on first 3 components" ## [5] "Component 1: Lengths: 5, 20" ## [6] "Component 1: Attributes: < target is NULL, current is list >" ## [7] "Component 1: target is numeric, current is matrix" ## [8] "Component 2: Modes: numeric, character" ## [9] "Component 2: target is numeric, current is character" ## [10] "Component 3: Modes: numeric, list" ## [ reached getOption("max.print") -- omitted 4 entries ]
# use with isTRUE() if T/F desired isTRUE(all.equal(1:5, factor(1:5)))
## [1] FALSE
&
, “AND” takes two Booleans and returnsTRUE
if both of them areTRUE
,NA
if either isNA
, andFALSE
otherwise. Can be applied over twological vectors
of the same length:c(T, T, F) & c(T, T, T)
## [1] TRUE TRUE FALSE
|
, “OR” is the same as&
but returnsTRUE
if either or both of the two compared elements isTRUE
.xor()
, “exclusive OR” is same as above but returnsTRUE
only if either the first or the second, but not both of the two compared elements, isTRUE
.xor(c(T, F, F), c(T, F, T))
## [1] FALSE FALSE TRUE
&&
and||
, single-element versions of&
and|
. They only compare the first element of both of the vectors provided (i.e., x[1] vs y[1]):c(T, F, F) || c(T, F, T)
## [1] TRUE
- all of the above can be negated using the ‘
!
’ operator, e.g.:x != y
!x > y
!is.na(x)
!any(is.na(x))
is equivalent toall(!is.na(x))
!(x & y)
is equivalent toxor(x, y) | (!x & !y)