Chapter 15 Multithreading

Multithreading is the last resort when you have already optimized all the nitty-gritty details available to you. Most of the time you are okay using single threaded calculation. Only at times when your calculation exceeds a certain time limit; say more than 60 seconds or 6 seconds depending on the situation, then you can use it for your advantage for normal calculations it’s just not worth the complexity. Because there are cases where multithreading will slow your computation because of the overhead it brings along with it.

15.1 Multi Threading has an overhead

There is an overhead of managing multiple processes which doesn’t allow it to achieve the theoritical maximum that we mostly read. With each subsequent thread you throw at the problem the marginal speed reduces and it could be possible that you make it more slower. Let’s check it with the code of data.table package which is written in optimized C and thus we can be rest assured of the results. Let take the flights dataset.

nyc <- data.table(
  flights
  )

# replicate(
#   n = 10,
#   expr = nyc,
#   ) |>
#   rbindlist()

Let’s take a fairly okay computation and see how much speed we gain with the single thread.

setDTthreads(1L)

microbenchmark::microbenchmark(
  single = nyc[,.(count = .N, total_dist =  sum(distance)),.(year,month, day)]
)
## Unit: milliseconds
##    expr      min       lq     mean median      uq     max neval
##  single 9.797101 10.51715 11.99587 11.064 11.4807 25.6034   100
setDTthreads(2L)

microbenchmark::microbenchmark(
  double = nyc[,.(count = .N, total_dist =  sum(distance)),.(year,month, day)]
)
## Unit: milliseconds
##    expr      min       lq     mean   median       uq     max neval
##  double 7.263401 7.733701 9.215801 8.299901 8.796251 19.6011   100
setDTthreads(4L)

microbenchmark::microbenchmark(
  double = nyc[,.(count = .N, total_dist =  sum(distance)),.(year,month, day)]
)
## Unit: milliseconds
##    expr      min       lq     mean   median      uq     max neval
##  double 6.776101 7.322401 8.675783 7.705201 7.97685 20.0717   100

15.2 Be Cautious with Database

15.3 Use Future Package

15.4 Send Only Bigger calculation