Chapter 9 Debugging

Debugging is a part of being a programmer. You just can’t escape it. It’s a huge topic and it takes years of experience to master. To understand the limitations of a language and to understand the errors requires too much study and experience. This is something that you learn through experience. But there are certain thing which you can do to make your life easier while debugging.

These are the tips I think everyone should follow for better debugging experience.

9.1 Write Unit Tests

When you are writing a function and you have enough time to document the expected results of a function it’s a good practice to write down your expectations as a code. In R there is an excellent package called testthat for doing it. You don’t need to create a package to test your code. You can simply test your functions by sourcing the entire file.

Sky is the limit when you want to test your code. You have to decide how much time you have and how much are willing to write for testing. There has to be a minimum limit of tests for each sprint. That actually keeps you in track.

Let’s start with a very simple example.

library(testthat)
## Warning: package 'testthat' was built under R version 4.1.3
## 
## Attaching package: 'testthat'
## The following object is masked from 'package:dplyr':
## 
##     matches
## The following objects are masked from 'package:magrittr':
## 
##     equals, is_less_than, not
foo <- function(name){
  if(length(name) == 0){
    stop("please provide a character")
  }
  if(! is.na(name)){
    if( !is.character(name) ){
      stop("please provide a character")
    }
    x <- sprintf("Hello %s, good morning", name)
    return(x)
  } else{
    return("please provide a name")
  }
}

testthat::test_that(
  desc = "testing foo",
  code = {
    ## check original result
    expect_equal(
      object = foo("vikram"),
      expected = "Hello vikram, good morning"
    )
    ## check results of NA
    expect_equal(
      object = foo(NA),
      expected = "please provide a name"
    )
    ## check length of actual content
    expect_length(
      object = foo("vikram"),
      n = 1
    )
    ## check Error on passing numeric code
    expect_error(
      object = foo(1)
    )
    ## check Error on passing Date code
    expect_error(
      object = foo(Sys.Date())
    )
    ## check the error on NULL values
    expect_error(
      object = foo(NULL)
    )
  }
)
## Test passed

This is how you need to write test for a simple hello world example. To tell you the truth I started with just basic equal test but then I realized

  • what if someone passes a number or a date

  • what is someone passes a null value

  • what is someone passes a NA

So on and so forth I started modifying the function and this is how you will too… This will make your function more secure and you will see less crashes. This is the reason I would like to begin this chapter with an emphasis on the fact that you should test your code.

9.2 Browser() and print() are your friend

To me browser function feels like a scene from movie matrix where you can stop everything around you and decide what is going wrong in the world. When you want to debug any function or a point where you assume that the error lies here, you should use browser() inside that function or script. Don’t be scared of using browser() this function makes you familiar with your code and R itself.

Browser is useful only when you know what function to look at. Print is your friend when you want to narrow down the candidates which are causing the error. Printing the objects on the console with help you understand what is happening inside a given function at the moment. It’s very useful for interactive web apps where reproducing the error is a little tricky.

A combination of print and browser can save a lot of your time. And yes make sure you delete the browser functions from your file. You can use ctrl + shift + f to search the entire project and every file in it for searching anything, including browser function.

9.3 Read the functions

R gives you the ability to read the function and it comes very handy during debugging a function you have not written. You can view any function definition by running the function without () round brackets. like this

quanteda::tokens
## function (x, what = "word", remove_punct = FALSE, remove_symbols = FALSE, 
##     remove_numbers = FALSE, remove_url = FALSE, remove_separators = TRUE, 
##     split_hyphens = FALSE, split_tags = FALSE, include_docvars = TRUE, 
##     padding = FALSE, verbose = quanteda_options("verbose"), ...) 
## {
##     tokens_env$START_TIME <- proc.time()
##     object_class <- class(x)[1]
##     if (verbose) 
##         catm("Creating a tokens object from a", object_class, 
##             "input...\n")
##     UseMethod("tokens")
## }
## <bytecode: 0x0000000026a22638>
## <environment: namespace:quanteda>

And you can check which methods are available for which classes by using

methods(class = "dfm")
##  [1] -             !             $             $<-           %%           
##  [6] %*%           %/%           &             *             /            
## [11] [             [[            [<-           ^             +            
## [16] all           any           anyNA         Arith         as.data.frame
## [21] as.dfm        as.logical    as.matrix     as.numeric    bootstrap_dfm
## [26] cbind         cbind2        coerce        coerce<-      colMeans     
## [31] colSums       Compare       convert       dfm           dfm_compress 
## [36] dfm_group     dfm_lookup    dfm_match     dfm_replace   dfm_sample   
## [41] dfm_select    dfm_smooth    dfm_sort      dfm_subset    dfm_tfidf    
## [46] dfm_tolower   dfm_toupper   dfm_trim      dfm_weight    dfm_wordstem 
## [51] dim           dim<-         dimnames      dimnames<-    docfreq      
## [56] docid         docnames      docnames<-    docvars       docvars<-    
## [61] fcm           featfreq      featnames     head          initialize   
## [66] is.finite     is.infinite   is.na         kronecker     length       
## [71] log           Logic         Math          Math2         meta         
## [76] meta<-        ndoc          nfeat         ntoken        ntype        
## [81] Ops           print         rbind         rbind2        rep          
## [86] rowMeans      rownames<-    rowSums       show          sparsity     
## [91] Summary       t             tail          topfeatures  
## see '?methods' for accessing help and source code

Or you can check how many classes have a method by same name with.

methods(generic.function = "print")[1:20]
##  [1] "print,ANY-method"             "print,dfm-method"            
##  [3] "print,diagonalMatrix-method"  "print,dictionary2-method"    
##  [5] "print,fcm-method"             "print,sparseMatrix-method"   
##  [7] "print.acf"                    "print.AES"                   
##  [9] "print.all_vars"               "print.anova"                 
## [11] "print.any_vars"               "print.aov"                   
## [13] "print.aovlist"                "print.ar"                    
## [15] "print.Arima"                  "print.arima0"                
## [17] "print.AsIs"                   "print.aspell"                
## [19] "print.aspell_inspect_context" "print.bibentry"

now most of these methods are hidden from general usage so you might not be able to view them.

# textstat_lexdiv.dfm 
# will not work will produce an error
# Error: object 'textstat_lexdiv.dfm' not found

quanteda.textstats::textstat_lexdiv
## function (x, measure = c("TTR", "C", "R", "CTTR", "U", "S", "K", 
##     "I", "D", "Vm", "Maas", "MATTR", "MSTTR", "all"), remove_numbers = TRUE, 
##     remove_punct = TRUE, remove_symbols = TRUE, remove_hyphens = FALSE, 
##     log.base = 10, MATTR_window = 100L, MSTTR_segment = 100L, 
##     ...) 
## {
##     measure <- match.arg(measure, c("TTR", "C", "R", "CTTR", 
##         "U", "S", "K", "I", "D", "Vm", "Maas", "MATTR", "MSTTR", 
##         "all"), several.ok = TRUE)
##     UseMethod("textstat_lexdiv")
## }
## <bytecode: 0x0000000022c0b500>
## <environment: namespace:quanteda.textstats>
# works but the implementation is still hidden because method will be decided based on the class provided to the method at the exact moment of calculation

But If you still want to know how to know the definition of a method of the class. Just use this code.

getAnywhere("textstat_lexdiv.dfm")
## A single object matching 'textstat_lexdiv.dfm' was found
## It was found in the following places
##   namespace:quanteda.textstats
## with value
## 
## function (x, measure = c("TTR", "C", "R", "CTTR", "U", "S", "K", 
##     "I", "D", "Vm", "Maas", "all"), remove_numbers = TRUE, remove_punct = TRUE, 
##     remove_symbols = TRUE, remove_hyphens = FALSE, log.base = 10, 
##     ...) 
## {
##     tokens_only_measures <- c("MATTR", "MSTTR")
##     x <- as.dfm(x)
##     if (!sum(x)) 
##         stop(message_error("dfm_empty"))
##     if (remove_hyphens) 
##         x <- dfm_split_hyphenated_features(x)
##     removals <- removals_regex(separators = FALSE, punct = remove_punct, 
##         symbols = remove_symbols, numbers = remove_numbers, url = TRUE)
##     if (length(removals)) {
##         x <- dfm_remove(x, paste(unlist(removals), collapse = "|"), 
##             valuetype = "regex")
##     }
##     if (!sum(x)) 
##         stop(message_error("dfm_empty after removal of numbers, symbols, punctuations, hyphens"))
##     if (any(tokens_only_measures %in% measure)) 
##         stop("average-based measures are only available for tokens inputs")
##     available_measures <- as.character(formals()$measure)[-1]
##     measure <- match.arg(measure, choices = available_measures, 
##         several.ok = !missing(measure))
##     if ("all" %in% measure) 
##         measure <- available_measures[!available_measures %in% 
##             "all"]
##     compute_lexdiv_dfm_stats(x, measure = measure, log.base = log.base)
## }
## <bytecode: 0x000000002a1a1cf8>
## <environment: namespace:quanteda.textstats>

If you want to understand more of this learn OOPS in R. R has multiple object oriented systems and R is a highly Object oriented programming but the style is different from other languages. This book is all about best practices in R and thus we are not going to go deep into fundamentals of R programming here but this trick is worth knowing.

These tricks will help you read code that is loaded on your environment but you have not written them. Reading someone else’s code makes you a better coder. And it helps you understand why this code is breaking up.

9.4 Version Control System

Use a version control system. For those who don’t know, it means you can commit changes to a central repository and compare the changes anytime. GitHub and BitBucket are the most popular of these solution.

Github allows a free account for every individual. Even for personal projects I would recommend you to use github or any version control system as such. For bigger projects use the one your organization recommends. This will help you compare changes you commit and go back to the old version that is up and running.

It sounds easy but the power to compare what you changed in the code can help you pin point the error as quickly as possible.

9.5 Make small commits

You should always use small commits. I have seen people who keep the code with themselves for days and change a thousand thing in the code before pushing it to github. I too am one of those people.

Make small changes to your code and see if it’s working and then commit those changes. The smaller the commits the better debugging experience you have. Then it’s easier to roll back the changes and it’s easier to read the code to understand what might have caused this error.

9.6 Use curly brackets

R gives you the ability to write code without {} but It makes your code harder to read and understand the blocks in segregation. I have seen people write code like this

# if statements
if( TRUE ) print(TRUE) else print(FALSE)
## [1] TRUE
# loops
for(i in 1:10) print(i)
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
# functions
function(x) print(x)
## function(x) print(x)

It sure makes your code look concise but only when it’s as small as what I wrote. Even then I would advice you to use curly brackets in all possible scenarios. Which helps specially when the code gets bigger or when you are using multiple of these statements together. Let’s take this code for example.

function(x)
  for(i in 1:10)
    if( i %% 2 == 0 ) print(TRUE) else print(FALSE)
## function(x)
##   for(i in 1:10)
##     if( i %% 2 == 0 ) print(TRUE) else print(FALSE)

During big apps you are never sure of how many lines you need to write inside a function or a loop or a conditional and you have to update your code frequently. Without curly brackets it gets harder and harder to pin point the block that is causing the error.

So in short using curly brackets help you understand the logic a little better and makes it easier to pin point the block that’s causing the error.

9.7 Always use named parameters

Let’s compare two code in the below chunk.

# Code with named parameters
#
# call_cognitive_endpoint(
#   endpoint = speech$get_endpoint(),
#   operation = "models/base",
#   body = list(),
#   options = list(),
#   headers = list(`content-type` = 'audio/wav'),
#   http_verb = "POST"
# )

# -----------------

# Code without named parameters
#
# call_cognitive_endpoint(  
#   speech$get_endpoint(),
#   "models/base",
#   list(),
#   list(),
#   list(`content-type` = 'audio/wav'),
#   "POST"
# )

They are commented because we are only focusing on the structure of the function not the working of the function. Which of these codes looks more readable to you?

Can you be sure in the below function that you have provided arguments in the right order and you are using the function exactly as it is meant to be used? This is why named parameters actually save time during debugging compared to unnamed one. This also makes code transferable which means that any new person in the team can quickly pick up where you left of, because may be you are so familiar with a function that you assume naming the parameter is not required at all but someone else might not be so much familiar with it.

In big organizations where people come and go and anybody can be reassigned to the same code it helps to make it easy to read. And it will you in the long run when you will read your own code after say 1 to 2 years.

9.8 Log the errors

This is an old advice I used in functions chapter where I asked you to use tryCatch in all the functions so that it doesn’t brake during production. To extend that I would also argue to add logs specially the errors onto a json file or a database table.

When you are running your code on your own computer you might get only 1000 bugs and you are prepared to handle those bugs in production. But suppose you create a shiny app that is used by 1000 more people. In those circumstances you will encounter which you might not be able to reproduce so easily and no body will tell you what bug your app still has. Logging errors is a standard practice in programming domain and it’s necessary for production grade apps, be it shiny app or a REST api.

There are many packages available in R for logging I don’t have a preference on any of them. It’s good to use a database instead of json.

9.9 Don’t Use already used names

R allows you to override variable and function names that exist. But this is something you shouldn’t do. Not even once. I get it when you don’t know it collides with something but when you do you should avoid those names at all costs.

Take fro example T and F are just variables which have TRUE and FALSE values stored in them. They are not a replacement for boolean values.

T <- FALSE
myvar <- TRUE
if( myvar == T ) print(TRUE) else print(FALSE)
## [1] FALSE

Here all your logic is gone because someone thought of renaming A variable T. Most common such error occurs on naming the object remember that these are all valid functions in base R.

  1. dt
  2. df
  3. data

These are just a few examples where you can mess up your code very easily without realizing that you are doing something very wrong here. Just like other programming languages will throw an error if you use an already defined name and wont allow you to reuse it, you should treat R the same even though it will not throw an error and you might be lucky enough that it will never throw and error. But you should get into a habit of not reusing function and object names in R as well.

9.10 Use Simple code

R gives you a lot of flexibility in coding style. You can write very succinct and precise code with R with highly complex methods. But try to spread your code in decent number of lines so that you can read it later on. Let me give you a very basic example.

x <- y <- z <- 1:10

## or

x <- 3; y <- 5; z <- 8

## or

foo <- function(x){
  y <<- x
}
## <<- is permisable only in very very very rare scenarios

This is doable in R but doesn’t mean you should do it. This code could easily have been split into 6 lines and it will increase the readability of your code. People from specially maths , finance, science etc… background love to write complicated equations and they carry the same attitude to their coding style too… However coding is more about code maintenance than about writing code and you can’t hope that next person will have equal abilities that you do.

Write simple and beautiful code is the best advice I can give in this entire chapter. This makes your life easier and of the people working with you.

9.11 Conclusion

In this chapter we discussed multiple strategies of dealing with and avoiding debugging complexities. Hope if you follow most of these tricks you will feel that you are a better debugger than you were before and it will save you a lot of time in the process. To recap what we have learned today.

  1. Write Tests as much as possible
  2. Use print to point of where your code fails
  3. use browser to check the code
  4. always delete browser functions
  5. read the function
  6. You can even read hidden methods in R
  7. Use version control system
  8. make small commits
  9. use curly brackets in all your code
  10. Pass all arguments to a function through their name and not the position
  11. log the errors
  12. avoid already used function or object names
  13. write simple code
  14. avoid using T, F , <<- & ;