Chapter 15 Multithreading
Multithreading is the last resort when you have already optimized all the nitty-gritty details available to you. Most of the time you are okay using single threaded calculation. Only at times when your calculation exceeds a certain time limit; say more than 60 seconds or 6 seconds depending on the situation, then you can use it for your advantage for normal calculations it’s just not worth the complexity. Because there are cases where multithreading will slow your computation because of the overhead it brings along with it.
15.1 Multi Threading has an overhead
There is an overhead of managing multiple processes which doesn’t allow it to achieve the theoritical maximum that we mostly read. With each subsequent thread you throw at the problem the marginal speed reduces and it could be possible that you make it more slower. Let’s check it with the code of data.table package which is written in optimized C and thus we can be rest assured of the results. Let take the flights dataset.
<- data.table(
nyc
flights
)
# replicate(
# n = 10,
# expr = nyc,
# ) |>
# rbindlist()
Let’s take a fairly okay computation and see how much speed we gain with the single thread.
setDTthreads(1L)
::microbenchmark(
microbenchmarksingle = nyc[,.(count = .N, total_dist = sum(distance)),.(year,month, day)]
)
## Unit: milliseconds
## expr min lq mean median uq max neval
## single 9.797101 10.51715 11.99587 11.064 11.4807 25.6034 100
setDTthreads(2L)
::microbenchmark(
microbenchmarkdouble = nyc[,.(count = .N, total_dist = sum(distance)),.(year,month, day)]
)
## Unit: milliseconds
## expr min lq mean median uq max neval
## double 7.263401 7.733701 9.215801 8.299901 8.796251 19.6011 100
setDTthreads(4L)
::microbenchmark(
microbenchmarkdouble = nyc[,.(count = .N, total_dist = sum(distance)),.(year,month, day)]
)
## Unit: milliseconds
## expr min lq mean median uq max neval
## double 6.776101 7.322401 8.675783 7.705201 7.97685 20.0717 100