Chapter 12 Some Tips to make R code faster

“Make it work, then make it beautiful, then if you really, really have to, make it fast. 90 percent of the time, if you make it beautiful, it will already be fast. So really, just make it beautiful!”

– Joe Armstrong

In IT sector speed is very important. People rewrite tons of algorithms back again in c, c++, go and java. Just to gain that milliseconds or may be microseconds performance over one another. When you compared to these languages R is a very slow language. If you really need to get nanosecond level of optimization in R that are not possible without going to Rcpp; which by the way is a very easy wrapper for R user around C++. But still R code can be optimized to a level where you can get production level efficiency in R without too much trouble. And R is not slow compared to interpreted languages like python, Javascript, ruby etc…

12.1 Use Latest version of R

Each iteration of R is improving something to gain more speed with less and less memory. It’s always useful if you need more speed to switch to Latest version of R and see if you get any speed gains. In general if you are using old versions of R or old functions that are deprecated and are no longer recommended by switching to a new version or new methods you will get a speed advantage for sure.

Constant criticism that R is slow has made R to work in this respect and R is evolving according to the needs of time. There is not much to add here. If possible use the latest version, packages or methods mostly they might have more speed.

12.2 Benchmark and profiling the code

R is very obscure language there are no direct rules for speed gains. You might think you are making the code fast but in turn you could make it slow. The worst part about R is that you can write very very very slow code in R without realizing what are you missing. same R code can run 1K times faster when optimized. R is a very easy language to write slow code in. This is something you should keep in mind while writing the code.

This is the reason you should benchmark your options, It may not give you much speed improvement, it may not give you any speed improvement at all. If you want to optimize R you must learn to benchmark the options. I would not go in details but microbenchmark is the best package for this task. Other packages have too many assumptions.

Sometime you may assume your code is slow because you have not used a best practice but your code might be slow for an entirely different reason. To figure out all the parts of code in comparison to one another profiling works like charm. You should use it especially before making your code live. You could save so much of the CPU time just by using profiler and evaluating if you are okay with the speed or you would actually want to sit for hours to save a few milliseconds. It doesn’t matter in an ETL script mostly but it matters in an API or Shiny app. You have to decide what’s okay and profiling your code will help you with it.

12.3 Algorithm matters more than language

I see many people who write R for a single project and than because they can’t make it run fast they switch to other languages like python mostly because they have read a few blog post written 5 to 10 years ago on how slow R is. In IT sector speed matters most and I would agree that if you could save a few milliseconds just by following a few basic rules please do that. Because when you create a shiny App or a Plumber API which many people hit at the same time every millisecond counts. But Don’t get occupied by optimizing your code before it starts to work.

Let me give you a basic structure, if your API can handle 40-50 requests per second on a single core you are at very high speed. Which means 20 to 30ms for each request. Usually network latency and disk caching and talking to DB etc… takes more time. APIs mostly go from 200 to 500 ms per second in complex web apps. And R may not be the fastest language in the world but it sure can reach this level with minimum effort possible. Rest is all about scaling your app.

So before you think about switching the language or saying that R in general is a slow language ask yourself have you optimized your code yet. Because if you don’t optimize your code it doesn’t matter what language you write it in. It will always be slow. Let me beat c++ with R and show you what I mean.

Lets understand this by a very simple example. Let’s start with the worst way you could code in any language called recursive functions and mark my words Never Use Recursive Functions. You are always better off without them. Let’s try to see if we can find the good old fibonacci numbers and first 30 of them. We will write them in R and C++ alike.

recurse_fib_r <- function(fnum){
  if(fnum <= 1) {
    return(fnum)
  } else {
    return(
      recurse_fib_r(fnum-1) + recurse_fib_r(fnum-2)
      )
  }
}
#include <Rcpp.h>

//[[Rcpp::export]]
int recurse_fib_rcpp(int fnum){
  if(fnum <= 1) {
    return(fnum) ;
  } else {
    return recurse_fib_rcpp( fnum - 1 ) + recurse_fib_rcpp( fnum - 2 ) ;
  }
}

lets compare both the functions now.

microbenchmark::microbenchmark(
  mapply( recurse_fib_rcpp, 1:30 ),
  mapply( recurse_fib_r, 1:30 ),
  times = 10
)
## Unit: milliseconds
##                            expr       min        lq       mean     median
##  mapply(recurse_fib_rcpp, 1:30)    6.5913    7.0788    7.42341    7.49035
##     mapply(recurse_fib_r, 1:30) 2854.4090 2898.0144 2918.62795 2905.93615
##         uq       max neval cld
##     7.7385    7.9827    10  a 
##  2939.1185 3011.4093    10   b

While c++ is still at milliseconds R has reached to seconds and that too for only 30 fibonacci numbers. This is not acceptable at any level you work on. Even if you are writing basic scripts this is not permissible to be sitting on your computer at all. Let’s try to save memory by caching the results of previous operations.

Lets try to save computation by using memoise package for caching intermediate results.

mem_fib_r <- function(fnum){
 if(fnum <= 1) {
    return(fnum)
  } else {
    return(
      memoised_fib_r(fnum - 1) + memoised_fib_r( fnum - 2)
      )
  }
}

memoised_fib_r <- memoise::memoise(mem_fib_r)

Lets compare it with c++

microbenchmark::microbenchmark(
  mapply( recurse_fib_rcpp, 1:30 ),
  mapply( memoised_fib_r, 1:30 ),
  times = 10
)
## Unit: milliseconds
##                            expr    min     lq    mean  median     uq     max
##  mapply(recurse_fib_rcpp, 1:30) 6.7944 7.0323 7.55544 7.71975 7.7898  8.1018
##    mapply(memoised_fib_r, 1:30) 1.7982 1.8331 3.34254 2.10280 2.3636 14.7649
##  neval cld
##     10   b
##     10  a

We have beat the c++ just by a very simple optimization. But If we write a simple function that doesn’t use recursion we can still get better performance. Let’s write a better algorithm by writing a loop.

save_fib_r <- function(fnum){
  fnum <- fnum + 1
  vec <- integer(fnum)
  vec[[2]] <- 1
  if(fnum > 2){
    for(i in 3:fnum){
      vec[[i]] <- vec[[ i - 1]] + vec[[ i - 2]]
    }
  }
  
  return(vec[[fnum]])
}

Lets compare the results.

microbenchmark::microbenchmark(
  mapply( recurse_fib_rcpp, 1:30 ),
  mapply( save_fib_r, 1:30 ),
  times = 10
)
## Unit: microseconds
##                            expr    min     lq    mean  median     uq    max
##  mapply(recurse_fib_rcpp, 1:30) 6494.0 6774.3 7111.51 7140.05 7418.8 7616.5
##        mapply(save_fib_r, 1:30)  103.2  107.4  660.84  123.75  141.2 5515.2
##  neval cld
##     10   b
##     10  a

Now we are beating it with around 40x speed or more. But I think we can do better. This functions is vectorized but I am only asking for a single number I am doing the same calculations multiple time inside mapply function. If instead of using mapply I call the entire vec directly I will save computation.

save_vec_fib_r <- function(fnum){
  vec <- integer(fnum)
  vec[[2]] <- 1
  if(fnum > 2){
    for(i in 3:fnum){
      vec[[i]] <- vec[[ i - 1]] + vec[[ i - 2]]
    }
  }
  
  return(vec)
}

Now let’s compare the differences

microbenchmark::microbenchmark(
  mapply( save_fib_r, 1:1e3 ),
  save_vec_fib_r(1e3),
  times = 10
)
## Unit: microseconds
##                        expr     min      lq     mean   median      uq     max
##  mapply(save_fib_r, 1:1000) 58294.4 58331.8 60174.62 59327.55 61162.5 65599.0
##        save_vec_fib_r(1000)   122.5   126.4   677.91   132.30   133.4  5612.7
##  neval cld
##     10   b
##     10  a

Now the difference is not only huge but we are calculating 1000 fibonacci numbers instead of just 30 we were working on previously. But I agree that I didn’t gave c++ a chance. Languages have come and gone c++ have stood the test of time. It’s the fastest language there is and R is nowhere close to it. I was just trying to compare an optimized version with an un-optimized one.

Let’s rewrite this same function in Rcpp just to see how far we are from the fastest programming language.


#include <Rcpp.h>

using namespace Rcpp;

//[[Rcpp::export]]
IntegerVector fib_rcpp(int fnum){

  IntegerVector vec(fnum);
  vec[0] = 0;
  vec[1] = 1;
  if(fnum > 1){
    for(int i = 2; i < fnum; ++i) {
      vec[i] = vec[ i - 1] + vec[ i - 2];
    }
  }
  
  return vec;
}
microbenchmark::microbenchmark(
  fib_rcpp(1e5),
  save_vec_fib_r(1e5)
)
## Unit: microseconds
##                   expr     min       lq      mean   median      uq     max
##        fib_rcpp(1e+05)   171.6   209.15   244.169   226.55   264.8  1139.7
##  save_vec_fib_r(1e+05) 10697.3 11870.90 12386.093 12426.90 12746.7 21328.8
##  neval cld
##    100  a 
##    100   b

So optimized R is about 28-30 times slower than the optimized rcpp code, which is a very good spot to be at. And to top it off now we are working on 1e5 numbers and that too within milliseconds in R. I wouldn’t loose a sleep over it.

So always try to optimize the language before going anywhere else. R is the most easiest language to write slow code in but the code can be optimized to 1000x easily with a few hacks like I just did.

12.4 Read the function

You may assume just because you are using a base function that would be optimized to the core and thus it will be fastest solution out there. However that’s far from truth sometime base R functions are overextended to check a few basic assumptions. You should get into a habit of reading the code. It’s beneficial for debugging and for optimization as well.

Let start small, R has a built in function by name replace and it does exactly what is intended from it it replaces a value from an index of a vector. But let’s read it.

replace
## function (x, list, values) 
## {
##     x[list] <- values
##     x
## }
## <bytecode: 0x000000001d1dc0f8>
## <environment: namespace:base>

It’s no more than a basic function you could write yourself. Let’s check another one of my favorite function.

stopifnot
## function (..., exprs, exprObject, local = TRUE) 
## {
##     n <- ...length()
##     if ((has.e <- !missing(exprs)) || !missing(exprObject)) {
##         if (n || (has.e && !missing(exprObject))) 
##             stop("Only one of 'exprs', 'exprObject' or expressions, not more")
##         envir <- if (isTRUE(local)) 
##             parent.frame()
##         else if (isFALSE(local)) 
##             .GlobalEnv
##         else if (is.environment(local)) 
##             local
##         else stop("'local' must be TRUE, FALSE or an environment")
##         E1 <- if (has.e && is.call(exprs <- substitute(exprs))) 
##             exprs[[1]]
##         cl <- if (is.symbol(E1) && E1 == quote(`{`)) {
##             exprs[[1]] <- quote(stopifnot)
##             exprs
##         }
##         else as.call(c(quote(stopifnot), if (!has.e) exprObject else as.expression(exprs)))
##         names(cl) <- NULL
##         return(eval(cl, envir = envir))
##     }
##     Dparse <- function(call, cutoff = 60L) {
##         ch <- deparse(call, width.cutoff = cutoff)
##         if (length(ch) > 1L) 
##             paste(ch[1L], "....")
##         else ch
##     }
##     head <- function(x, n = 6L) x[seq_len(if (n < 0L) max(length(x) + 
##         n, 0L) else min(n, length(x)))]
##     abbrev <- function(ae, n = 3L) paste(c(head(ae, n), if (length(ae) > 
##         n) "...."), collapse = "\n  ")
##     for (i in seq_len(n)) {
##         r <- ...elt(i)
##         if (!(is.logical(r) && !anyNA(r) && all(r))) {
##             dots <- match.call()[-1L]
##             if (is.null(msg <- names(dots)) || !nzchar(msg <- msg[i])) {
##                 cl.i <- dots[[i]]
##                 msg <- if (is.call(cl.i) && identical(cl.i[[1]], 
##                   quote(all.equal)) && (is.null(ni <- names(cl.i)) || 
##                   length(cl.i) == 3L || length(cl.i <- cl.i[!nzchar(ni)]) == 
##                   3L)) 
##                   sprintf(gettext("%s and %s are not equal:\n  %s"), 
##                     Dparse(cl.i[[2]]), Dparse(cl.i[[3]]), abbrev(r))
##                 else sprintf(ngettext(length(r), "%s is not TRUE", 
##                   "%s are not all TRUE"), Dparse(cl.i))
##             }
##             stop(simpleError(msg, call = if (p <- sys.parent(1L)) 
##                 sys.call(p)))
##         }
##     }
##     invisible()
## }
## <bytecode: 0x00000000155c1860>
## <environment: namespace:base>

Again it’s basic R function, a huge one though. I wouldn’t recommend you to rewrite it but if you just need a stop call on a basic condition you will be writing faster code with just simple if and stop function.

Let’s see that again in other base code.

ifelse
## function (test, yes, no) 
## {
##     if (is.atomic(test)) {
##         if (typeof(test) != "logical") 
##             storage.mode(test) <- "logical"
##         if (length(test) == 1 && is.null(attributes(test))) {
##             if (is.na(test)) 
##                 return(NA)
##             else if (test) {
##                 if (length(yes) == 1) {
##                   yat <- attributes(yes)
##                   if (is.null(yat) || (is.function(yes) && identical(names(yat), 
##                     "srcref"))) 
##                     return(yes)
##                 }
##             }
##             else if (length(no) == 1) {
##                 nat <- attributes(no)
##                 if (is.null(nat) || (is.function(no) && identical(names(nat), 
##                   "srcref"))) 
##                   return(no)
##             }
##         }
##     }
##     else test <- if (isS4(test)) 
##         methods::as(test, "logical")
##     else as.logical(test)
##     ans <- test
##     len <- length(ans)
##     ypos <- which(test)
##     npos <- which(!test)
##     if (length(ypos) > 0L) 
##         ans[ypos] <- rep(yes, length.out = len)[ypos]
##     if (length(npos) > 0L) 
##         ans[npos] <- rep(no, length.out = len)[npos]
##     ans
## }
## <bytecode: 0x0000000012d40950>
## <environment: namespace:base>

You might think ifelse is an optimized function in base R which is faster and optimized at the compiler level or interpretator level. But in fact if you read the function carefully and realize that it’s wasting on checking if you are passing an atomic vector and you are better off just using the last 5-6 lines of the function for a faster result.

These are just the basic examples I could think out of my mind and there are tons of such example where you could optimize a function just by reading it and realizing you might not need so much of hassle in the first place.

You could avoid meta-programming or non-standard evaluation of these functions just by rewriting some of it’s parts yourself. Reading the function will give you insight into what’s it trying to do and is it fast enough for your use case and can you optimize it yourself. This thing applies to package level codes as well. And sometimes, not always it’s good to rewrite a custom solution for yourself.

12.5 Use Conditionals to break computations

Somebody told me that

the key to going BIG is doing as LITTLE as possible.

R understands it and does that by default. Let’s check an example where this is true.

foo <- function(x){
  if( x == 10){
    bar()
  }
  
  print(x)

}

foo(1)
## [1] 1

This function worked perfectly fine even though we haven’t created any function by name bar(). R haven’t evaluated that expression at all. In most of the other programming languages this is not possible at all. We will get an error During compilation of the function. But R lets you go away with this. And then there are other example like this

if( TRUE || stop()) print("TRUE")
## [1] "TRUE"

Here because || has lazy evaluation and doesn’t read the next option unless first one is false you save time by doing it especially for complex equations.

lazy_return <- function(x){
  
  y <- 1:x
  
  return( y )
}

eager_return <- function(x){
  
  y <- integer(x)
  for(i in 1:x){
    y[[i]] <- i
  }

  return(y)
}

microbenchmark::microbenchmark(
  lazy_return(1e5),
  eager_return(1e5)
)
## Unit: nanoseconds
##                 expr     min      lq    mean  median      uq     max neval cld
##   lazy_return(1e+05)     200     400   15428    1300    3350 1316400   100  a 
##  eager_return(1e+05) 3615500 4050150 4497646 4220500 4463650 7052300   100   b

ALTREP based vectors have the same effect as well. R hasn’t started evaluating y in function lazy_return while it evaluated them in eager_return. Just because y are not evaluated in lazy_return while they are evaluated in the eager_return.

x <- lazy_return(1e3)
y <- eager_return(1e3)
 
.Internal(inspect(x))
## @0x00000000216e2bd8 13 INTSXP g0c0 [REF(65535)]  1 : 1000 (compact)
.Internal(inspect(y))
## @0x0000000021ae88d0 13 INTSXP g0c7 [REF(2)] (len=1000, tl=0) 1,2,3,4,5,...

People normally assume that for loops in R are very slow this is the reason eager_return is slow. But when you inspect the internal structure of both x and y you will see that x is a compact representation of number 1:1000 and thus it’s not using memory and it’s not even evaluated yet. While Y is a full fledged vector with all the numbers from 1 to 1000 stored in it. It consumes memory and time.

While all the other calculations on these vectors will work exactly the same and it might not take as much time as you would assume.

microbenchmark::microbenchmark(
  altrep = x + x,
  full_vector = y + y
)
## Unit: microseconds
##         expr min lq  mean median   uq  max neval cld
##       altrep 1.8  2 2.555    2.2 2.80  6.2   100   a
##  full_vector 1.8  2 2.773    2.2 2.95 10.2   100   a

Other such example would be not evaluating a promise until it’s needed. In very simple terms R divides every operations into promises that have to be evaluated at a later stage and only when they are needed, Otherwise they might not be evaluated at all.

func <- function(x){
  function(){
    eager_return(x)
  }
}
x <- 10
a <- func(x)
x <- 12
a()
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12

instead of 10 which we used when we created a it actually produced 12 because it didn’t started evaluating the values until it actually had to evaluate the function.

So you can use this advice multiple ways, but try to break your code in multiple chunks and evaluate only that you need at the moment. It helps speed up the code a lot by delaying the calculations and saving memory too.

12.6 Use Faster packages

R is not among fast languages. It’s a well known fact. It’s meant to write scripts for data science, statistics and analytics etc… This is the reason R uses other languages for faster computation. You can use C++, C, Java and rust directly from R. And because R have been around for almost 30 years somebody somewhere must have created a package you can use to solve your problem. In this regards the knowledge of existing packages to solve a problem in hand is actually better. This is something you can learn through experience or through guru google.

There are multiple such packages that you can use to speed up your workflow.

  1. data.table
  2. xts
  3. Matrix
  4. collapse
  5. RFast

Out of my head I can only think of these very few but general libraries, which are very fast and actually can save you a lot of time. But then there are some general libraries that have equally fast alternatives too… Like tidytext vs quanteda. Quanteda uses the memory more efficiently by using sparse matrices and they use c++ functions under the hood, while Tidytext mostly uses tibble which explodes in size very quickly. And then there might be other packages that could have a faster alternative. Like I have been using qs package a lot for storing and retrieving data from disk, It’s actually faster than base R.

12.7 Some pointers

12.7.1 use [[ instead of [ when you can

x <- data.frame(y = 1:1e4)

microbenchmark::microbenchmark(
  "[[" = for(i in 1:1e4){
    x$y[[i]] <- i * 2
  },
  "[" = for(i in 1:1e4){
    x$y[i] <- i * 2
  },
  times = 10
)
## Unit: milliseconds
##  expr      min       lq     mean   median       uq      max neval cld
##    [[ 101.4368 101.6790 116.5352 103.7953 107.1318 224.8473    10   a
##     [ 101.5530 103.9401 106.1991 104.9040 108.3435 114.7021    10   a

The difference is just in milliseconds but there is a difference non the less. If you use it precisely you might save a few millisecond when you need them with just a basic understanding that this too could help you at times.

Best way to navigate a nested list is through [[ function by passing a character vector. Take this as an example

x <- list(
  y = list(
    z = list(
      a = 1
    )
  )
)

x[[c("y", "z", "a")]]
## [1] 1

or if you want to extract just z then

x[[c("y", "z")]]
## $a
## [1] 1

It’s pretty helpful when you are working on json objects. take this for an example.

x <- list(
  y = list(
    list(
      z = 1
    ),
    list(
      z = 2
    ),
    list(
      z = 3
    ),
    list(
      z = 4
    ),
    list(
      z = 5
    )
  )
)

lapply(x$y,`[[`, "z")
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4
## 
## [[5]]
## [1] 5

These tricks will help you get some juice out of your machine.

12.7.2 R calculates everything

R is a scripting language which means everything you write will be evaluated when the code will run. Compiler optimizations don’t work in R. Take an example.

year_2_sec <- function(x){
  y <- x * 365 * 24 * 60 * 60
  return(y)
}

year_2_sec_opti <- function(x){
  y <- x * 31536000
  return(y)
}

microbenchmark::microbenchmark(
  norm = year_2_sec(1:2e5),
  optimized = year_2_sec_opti(1:2e5)
)
## Unit: microseconds
##       expr   min     lq    mean median     uq    max neval cld
##       norm 500.0 674.45 850.391 742.70 792.35 8467.1   100   a
##  optimized 292.2 477.40 760.201 525.85 575.85 8424.4   100   a

Even though the difference is very small there is a difference none the less.

It goes for ( parenthesis and { braces too… In R this things are functions and are actually evaluated before a result is displayed. Let’s use the same example above but with many parenthesis.

microbenchmark::microbenchmark(
  without_braces = year_2_sec_opti(1:2e5),
  with_braces = year_2_sec_opti({
    (((((((((((((((((((1:2e5)))))))))))))))))))
    })
)
## Unit: microseconds
##            expr   min     lq    mean median    uq    max neval cld
##  without_braces 290.8 466.35 648.726 497.85 550.3 8362.1   100   a
##     with_braces 289.2 438.60 637.460 499.50 541.1 8353.7   100   a

R is not so slow language anymore so these differences are very small to notice but you can make it a habit to use only minimum code needed. This helps.

Fun fact: you can only nest a function up to 50 levels after that it breaks. I have tried it before and counted numbers. Don’t ask me why…

12.7.3 .Internal functions

R has some functions that are internal and can be accessed directly. R mostly provides you wrapper around those. Let’s take an example

integer
## function (length = 0L) 
## .Internal(vector("integer", length))
## <bytecode: 0x0000000012d1a238>
## <environment: namespace:base>

Here as you can read R is called an internal method by name vector and passing arguments to it. We can directly use this function as well.

x <- .Internal(vector("integer", 10))
y <- integer(10)

all.equal(x, y)
## [1] TRUE

This can come in handy specially in cases where Base R is performing multiple checks. I can give you examples to do this. But I would recommend you to not use .Internal function directly until you are very sure what are you doing. In most cases you can find a faster function in some other package that you can use.

12.7.4 Don’t Compile

R has a package called compiler that is used to compile any function in R. In old blogs you will still see examples of compilation making your code a little faster. After R 3.5 compilation was on by default. Now every function that you create is compiled and thus it is already optimized and running a compilation on it will not give you any additional speed.

However R compilers still need to be optimized and there are people working on it. Hopefully we will get a better and faster R within a few more years. You can watch a video on it if you want to know more.

12.7.5 use direct method.object structure

R spends a little amount trying to figure out the class of the method belongs to. Like here

methods(generic.function = "as.Date")
## [1] as.Date.character   as.Date.default     as.Date.factor     
## [4] as.Date.IDate*      as.Date.numeric     as.Date.POSIXct    
## [7] as.Date.POSIXlt     as.Date.vctrs_sclr* as.Date.vctrs_vctr*
## see '?methods' for accessing help and source code

Now as you can see based on the type of object you supply as.Date will decide which method to implement. We can speed up our code a bit by directly specifying what objects are we working on.

microbenchmark::microbenchmark(
  oops = as.Date.numeric(10000, origin = as.Date.character("1970-01-01")),
  norm = as.Date(10000, origin = as.Date("1970-01-01"))
)
## Unit: microseconds
##  expr  min    lq   mean median    uq   max neval cld
##  oops 32.8 34.35 37.720  36.35 37.85  90.9   100  a 
##  norm 36.5 38.50 43.954  40.70 42.55 266.7   100   b

It might not matter much in a single call but in a loop we can definitely use all these tricks to speed up our code.

bad_code <- function(x){
  
  y <- numeric(x)
  
  for(i in 1:x){
    y[i] <- as.Date(
      x = (10000 + (i)), 
      origin = (as.Date(x = ("1970-01-01")))
    )
  }
  
  return(y)
}

faster_code <- function(x){

  y <- numeric(x)
  
  for(i in 1:x){
    y[[i]] <- as.Date.numeric(
      x = 10000 + i, 
      origin = as.Date.character(x = "1970-01-01")
    )
  }
  
  return(y)
  
}
microbenchmark::microbenchmark(
  bad_code(1e4),
  faster_code(1e4),
  times = 10
)
## Unit: milliseconds
##                expr      min       lq     mean   median       uq      max neval
##     bad_code(10000) 445.1615 449.5043 470.4116 457.1198 464.9821 590.3621    10
##  faster_code(10000) 398.3089 401.3773 403.2963 402.2267 405.5097 411.1452    10
##  cld
##    b
##   a

as you can see we can extract a few more drops from our CPU if we use these small techniques. It might not be much but again it’s not too much to remember. And if you use them precisely you can save seconds off an entire app by saving milliseconds on each operation.

12.8 Export Other languages

There is a limitation on How fast R can go. R is not the fastest language on the world. It might be one of the slowest for sure. But we use it because of the ecosystem it provides. The ability to download packages from CRAN and the assurance that goes along with it is exceptional. No other programming language comes even close to it. And almost anything you want to do with your data can be done from R directly. Statistics, analysis, visualizations, big data, ML and DL there are packages to deal with all of it. This is the reason We mostly use R because of it’s ecosystem not because of speed.

But when you have used most of the techniques and it still doesn’t work out. You can use JAVA, C++, C, fortran, python or Julia and import those functions directly it R. R has an interface for calling all of the mentioned languages. The most simple one among all those is rcpp. Which makes C++ work like an R code. Fully vectorized and thus making it easier for an R function to be rewritten in C++ very easily.

C++ is a huge language and it will take years of expertise to write c++ code. While basic Rcpp can be learned within a very small time frame and can be used effectively. My advice would be to learn basic c++ from any youtube video within like 2 to 3 hours and then read this book.

https://teuder.github.io/rcpp4everyone_en/

12.9 Conclusion

In this chapter we discussed how to speed up R code with basic tips available that you can remember easily while writing code.

  1. R improves itself after every iteration use latest R
  2. Profile your code to see the slower part
  3. Benchmark your solution to check the speed
  4. Algorithm matters more than the language
  5. Always read the function to see what it is doing and do you need all this
  6. Use if statements to do as little computation as possible
  7. There is always a faster function from a package available search it
  8. use [[ when you can
  9. don’t write extra functions like (,{ or anything as such to save some time
  10. .Internal functions can be used but don’t use them you can find faster versions in Other packages
  11. R compiles it’s code by default compiler will not help you anymore
  12. If you know the exact implementation use method.object syntax
  13. Write Rcpp if necessary