2.4 The pool Package


The pool package opens and closes connections automatically. R users establish a connection to a pool. From then on pool gives the query an idle connection or opens a new connection. Besides simplifying client code, pooled connections can also provide a performance boost. To illustrate, I wrote a sample workload consisting of three queries. Next, I wrote a function to execute the queries using dbConnect/dbDisconnect, and a function to execute the queries using a pool.

library(pool)
library(microbenchmark)
library(ggplot2)
queries <- c("SELECT Getdate()","Select Getdate()","Select Getdate()","Select Getdate()")

dbconnectworkload <- function() {
 for(i in 1:length(queries)){
   con <- dbConnect(odbc::odbc(),Driver= myDriver,Server = myServer
                   ,Database = myDatabase,Trusted_Connection='yes')
  dbGetQuery(con,queries[i])
  dbDisconnect(con)
 }
}  

poolcon <- dbPool(odbc::odbc(),Driver= myDriver,Server = myServer
                  ,Database = myDatabase,Trusted_Connection='yes')

dbpoolworkload <- function() {
 for(i in 1:length(queries)){dbGetQuery(poolcon,queries[i])}
}

rs <- microbenchmark(dbconnectworkload() 
                    ,dbpoolworkload(),times = 100)

rs<-as.data.frame(rs)
rs$time <- rs$time/1000000
names(rs) <- c("Connection_Type","Milliseconds")
ggplot(rs, aes(x=Connection_Type, y=Milliseconds , fill=Connection_Type)) +
 geom_boxplot(alpha=0.4) +
    theme(text = element_text(size=16))

The pooled function connection appears to be faster. Profvis shows that most of the time spent in dbconnectworkload is on opening connections.

library(profvis)

profvis({dbconnectworkload ()})