Chapter 9 Debugging
Debugging is a part of being a programmer. You just can’t escape it. It’s a huge topic and it takes years of experience to master. To understand the limitations of a language and to understand the errors requires too much study and experience. This is something that you learn through experience. But there are certain thing which you can do to make your life easier while debugging.
These are the tips I think everyone should follow for better debugging experience.
9.1 Write Unit Tests
When you are writing a function and you have enough time to document the expected results of a function it’s a good practice to write down your expectations as a code. In R there is an excellent package called testthat
for doing it. You don’t need to create a package to test your code. You can simply test your functions by sourcing the entire file.
Sky is the limit when you want to test your code. You have to decide how much time you have and how much are willing to write for testing. There has to be a minimum limit of tests for each sprint. That actually keeps you in track.
Let’s start with a very simple example.
library(testthat)
## Warning: package 'testthat' was built under R version 4.1.3
##
## Attaching package: 'testthat'
## The following object is masked from 'package:dplyr':
##
## matches
## The following objects are masked from 'package:magrittr':
##
## equals, is_less_than, not
<- function(name){
foo if(length(name) == 0){
stop("please provide a character")
}if(! is.na(name)){
if( !is.character(name) ){
stop("please provide a character")
}<- sprintf("Hello %s, good morning", name)
x return(x)
else{
} return("please provide a name")
}
}
::test_that(
testthatdesc = "testing foo",
code = {
## check original result
expect_equal(
object = foo("vikram"),
expected = "Hello vikram, good morning"
)## check results of NA
expect_equal(
object = foo(NA),
expected = "please provide a name"
)## check length of actual content
expect_length(
object = foo("vikram"),
n = 1
)## check Error on passing numeric code
expect_error(
object = foo(1)
)## check Error on passing Date code
expect_error(
object = foo(Sys.Date())
)## check the error on NULL values
expect_error(
object = foo(NULL)
)
} )
## Test passed
This is how you need to write test for a simple hello world example. To tell you the truth I started with just basic equal test but then I realized
what if someone passes a number or a date
what is someone passes a null value
what is someone passes a NA
So on and so forth I started modifying the function and this is how you will too… This will make your function more secure and you will see less crashes. This is the reason I would like to begin this chapter with an emphasis on the fact that you should test your code.
9.2 Browser() and print() are your friend
To me browser
function feels like a scene from movie matrix where you can stop everything around you and decide what is going wrong in the world. When you want to debug any function or a point where you assume that the error lies here, you should use browser() inside that function or script. Don’t be scared of using browser() this function makes you familiar with your code and R itself.
Browser
is useful only when you know what function to look at. Print
is your friend when you want to narrow down the candidates which are causing the error. Printing the objects on the console with help you understand what is happening inside a given function at the moment. It’s very useful for interactive web apps where reproducing the error is a little tricky.
A combination of print and browser can save a lot of your time. And yes make sure you delete the browser functions from your file. You can use ctrl + shift + f
to search the entire project and every file in it for searching anything, including browser function.
9.3 Read the functions
R gives you the ability to read the function and it comes very handy during debugging a function you have not written. You can view any function definition by running the function without ()
round brackets. like this
::tokens quanteda
## function (x, what = "word", remove_punct = FALSE, remove_symbols = FALSE,
## remove_numbers = FALSE, remove_url = FALSE, remove_separators = TRUE,
## split_hyphens = FALSE, split_tags = FALSE, include_docvars = TRUE,
## padding = FALSE, verbose = quanteda_options("verbose"), ...)
## {
## tokens_env$START_TIME <- proc.time()
## object_class <- class(x)[1]
## if (verbose)
## catm("Creating a tokens object from a", object_class,
## "input...\n")
## UseMethod("tokens")
## }
## <bytecode: 0x0000000026a22638>
## <environment: namespace:quanteda>
And you can check which methods are available for which classes by using
methods(class = "dfm")
## [1] - ! $ $<- %%
## [6] %*% %/% & * /
## [11] [ [[ [<- ^ +
## [16] all any anyNA Arith as.data.frame
## [21] as.dfm as.logical as.matrix as.numeric bootstrap_dfm
## [26] cbind cbind2 coerce coerce<- colMeans
## [31] colSums Compare convert dfm dfm_compress
## [36] dfm_group dfm_lookup dfm_match dfm_replace dfm_sample
## [41] dfm_select dfm_smooth dfm_sort dfm_subset dfm_tfidf
## [46] dfm_tolower dfm_toupper dfm_trim dfm_weight dfm_wordstem
## [51] dim dim<- dimnames dimnames<- docfreq
## [56] docid docnames docnames<- docvars docvars<-
## [61] fcm featfreq featnames head initialize
## [66] is.finite is.infinite is.na kronecker length
## [71] log Logic Math Math2 meta
## [76] meta<- ndoc nfeat ntoken ntype
## [81] Ops print rbind rbind2 rep
## [86] rowMeans rownames<- rowSums show sparsity
## [91] Summary t tail topfeatures
## see '?methods' for accessing help and source code
Or you can check how many classes have a method by same name with.
methods(generic.function = "print")[1:20]
## [1] "print,ANY-method" "print,dfm-method"
## [3] "print,diagonalMatrix-method" "print,dictionary2-method"
## [5] "print,fcm-method" "print,sparseMatrix-method"
## [7] "print.acf" "print.AES"
## [9] "print.all_vars" "print.anova"
## [11] "print.any_vars" "print.aov"
## [13] "print.aovlist" "print.ar"
## [15] "print.Arima" "print.arima0"
## [17] "print.AsIs" "print.aspell"
## [19] "print.aspell_inspect_context" "print.bibentry"
now most of these methods are hidden from general usage so you might not be able to view them.
# textstat_lexdiv.dfm
# will not work will produce an error
# Error: object 'textstat_lexdiv.dfm' not found
::textstat_lexdiv quanteda.textstats
## function (x, measure = c("TTR", "C", "R", "CTTR", "U", "S", "K",
## "I", "D", "Vm", "Maas", "MATTR", "MSTTR", "all"), remove_numbers = TRUE,
## remove_punct = TRUE, remove_symbols = TRUE, remove_hyphens = FALSE,
## log.base = 10, MATTR_window = 100L, MSTTR_segment = 100L,
## ...)
## {
## measure <- match.arg(measure, c("TTR", "C", "R", "CTTR",
## "U", "S", "K", "I", "D", "Vm", "Maas", "MATTR", "MSTTR",
## "all"), several.ok = TRUE)
## UseMethod("textstat_lexdiv")
## }
## <bytecode: 0x0000000022c0b500>
## <environment: namespace:quanteda.textstats>
# works but the implementation is still hidden because method will be decided based on the class provided to the method at the exact moment of calculation
But If you still want to know how to know the definition of a method of the class. Just use this code.
getAnywhere("textstat_lexdiv.dfm")
## A single object matching 'textstat_lexdiv.dfm' was found
## It was found in the following places
## namespace:quanteda.textstats
## with value
##
## function (x, measure = c("TTR", "C", "R", "CTTR", "U", "S", "K",
## "I", "D", "Vm", "Maas", "all"), remove_numbers = TRUE, remove_punct = TRUE,
## remove_symbols = TRUE, remove_hyphens = FALSE, log.base = 10,
## ...)
## {
## tokens_only_measures <- c("MATTR", "MSTTR")
## x <- as.dfm(x)
## if (!sum(x))
## stop(message_error("dfm_empty"))
## if (remove_hyphens)
## x <- dfm_split_hyphenated_features(x)
## removals <- removals_regex(separators = FALSE, punct = remove_punct,
## symbols = remove_symbols, numbers = remove_numbers, url = TRUE)
## if (length(removals)) {
## x <- dfm_remove(x, paste(unlist(removals), collapse = "|"),
## valuetype = "regex")
## }
## if (!sum(x))
## stop(message_error("dfm_empty after removal of numbers, symbols, punctuations, hyphens"))
## if (any(tokens_only_measures %in% measure))
## stop("average-based measures are only available for tokens inputs")
## available_measures <- as.character(formals()$measure)[-1]
## measure <- match.arg(measure, choices = available_measures,
## several.ok = !missing(measure))
## if ("all" %in% measure)
## measure <- available_measures[!available_measures %in%
## "all"]
## compute_lexdiv_dfm_stats(x, measure = measure, log.base = log.base)
## }
## <bytecode: 0x000000002a1a1cf8>
## <environment: namespace:quanteda.textstats>
If you want to understand more of this learn OOPS in R. R has multiple object oriented systems and R is a highly Object oriented programming but the style is different from other languages. This book is all about best practices in R and thus we are not going to go deep into fundamentals of R programming here but this trick is worth knowing.
These tricks will help you read code that is loaded on your environment but you have not written them. Reading someone else’s code makes you a better coder. And it helps you understand why this code is breaking up.
9.4 Version Control System
Use a version control system. For those who don’t know, it means you can commit changes to a central repository and compare the changes anytime. GitHub and BitBucket are the most popular of these solution.
Github allows a free account for every individual. Even for personal projects I would recommend you to use github or any version control system as such. For bigger projects use the one your organization recommends. This will help you compare changes you commit and go back to the old version that is up and running.
It sounds easy but the power to compare what you changed in the code can help you pin point the error as quickly as possible.
9.5 Make small commits
You should always use small commits. I have seen people who keep the code with themselves for days and change a thousand thing in the code before pushing it to github. I too am one of those people.
Make small changes to your code and see if it’s working and then commit those changes. The smaller the commits the better debugging experience you have. Then it’s easier to roll back the changes and it’s easier to read the code to understand what might have caused this error.
9.6 Use curly brackets
R gives you the ability to write code without {}
but It makes your code harder to read and understand the blocks in segregation. I have seen people write code like this
# if statements
if( TRUE ) print(TRUE) else print(FALSE)
## [1] TRUE
# loops
for(i in 1:10) print(i)
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
# functions
function(x) print(x)
## function(x) print(x)
It sure makes your code look concise but only when it’s as small as what I wrote. Even then I would advice you to use curly brackets in all possible scenarios. Which helps specially when the code gets bigger or when you are using multiple of these statements together. Let’s take this code for example.
function(x)
for(i in 1:10)
if( i %% 2 == 0 ) print(TRUE) else print(FALSE)
## function(x)
## for(i in 1:10)
## if( i %% 2 == 0 ) print(TRUE) else print(FALSE)
During big apps you are never sure of how many lines you need to write inside a function or a loop or a conditional and you have to update your code frequently. Without curly brackets it gets harder and harder to pin point the block that is causing the error.
So in short using curly brackets help you understand the logic a little better and makes it easier to pin point the block that’s causing the error.
9.7 Always use named parameters
Let’s compare two code in the below chunk.
# Code with named parameters
#
# call_cognitive_endpoint(
# endpoint = speech$get_endpoint(),
# operation = "models/base",
# body = list(),
# options = list(),
# headers = list(`content-type` = 'audio/wav'),
# http_verb = "POST"
# )
# -----------------
# Code without named parameters
#
# call_cognitive_endpoint(
# speech$get_endpoint(),
# "models/base",
# list(),
# list(),
# list(`content-type` = 'audio/wav'),
# "POST"
# )
They are commented because we are only focusing on the structure of the function not the working of the function. Which of these codes looks more readable to you?
Can you be sure in the below function that you have provided arguments in the right order and you are using the function exactly as it is meant to be used? This is why named parameters actually save time during debugging compared to unnamed one. This also makes code transferable which means that any new person in the team can quickly pick up where you left of, because may be you are so familiar with a function that you assume naming the parameter is not required at all but someone else might not be so much familiar with it.
In big organizations where people come and go and anybody can be reassigned to the same code it helps to make it easy to read. And it will you in the long run when you will read your own code after say 1 to 2 years.
9.8 Log the errors
This is an old advice I used in functions chapter where I asked you to use tryCatch in all the functions so that it doesn’t brake during production. To extend that I would also argue to add logs specially the errors onto a json file or a database table.
When you are running your code on your own computer you might get only 1000 bugs and you are prepared to handle those bugs in production. But suppose you create a shiny app that is used by 1000 more people. In those circumstances you will encounter which you might not be able to reproduce so easily and no body will tell you what bug your app still has. Logging errors is a standard practice in programming domain and it’s necessary for production grade apps, be it shiny app or a REST api.
There are many packages available in R for logging I don’t have a preference on any of them. It’s good to use a database instead of json.
9.9 Don’t Use already used names
R allows you to override variable and function names that exist. But this is something you shouldn’t do. Not even once. I get it when you don’t know it collides with something but when you do you should avoid those names at all costs.
Take fro example T
and F
are just variables which have TRUE
and FALSE
values stored in them. They are not a replacement for boolean values.
<- FALSE
T <- TRUE
myvar if( myvar == T ) print(TRUE) else print(FALSE)
## [1] FALSE
Here all your logic is gone because someone thought of renaming A variable T. Most common such error occurs on naming the object remember that these are all valid functions in base R.
- dt
- df
- data
These are just a few examples where you can mess up your code very easily without realizing that you are doing something very wrong here. Just like other programming languages will throw an error if you use an already defined name and wont allow you to reuse it, you should treat R the same even though it will not throw an error and you might be lucky enough that it will never throw and error. But you should get into a habit of not reusing function and object names in R as well.
9.10 Use Simple code
R gives you a lot of flexibility in coding style. You can write very succinct and precise code with R with highly complex methods. But try to spread your code in decent number of lines so that you can read it later on. Let me give you a very basic example.
<- y <- z <- 1:10
x
## or
<- 3; y <- 5; z <- 8
x
## or
<- function(x){
foo <<- x
y
}## <<- is permisable only in very very very rare scenarios
This is doable in R but doesn’t mean you should do it. This code could easily have been split into 6 lines and it will increase the readability of your code. People from specially maths
, finance
, science
etc… background love to write complicated equations and they carry the same attitude to their coding style too… However coding is more about code maintenance than about writing code and you can’t hope that next person will have equal abilities that you do.
Write simple and beautiful code is the best advice I can give in this entire chapter. This makes your life easier and of the people working with you.
9.11 Conclusion
In this chapter we discussed multiple strategies of dealing with and avoiding debugging complexities. Hope if you follow most of these tricks you will feel that you are a better debugger than you were before and it will save you a lot of time in the process. To recap what we have learned today.
- Write Tests as much as possible
- Use print to point of where your code fails
- use browser to check the code
- always delete browser functions
- read the function
- You can even read hidden methods in R
- Use version control system
- make small commits
- use curly brackets in all your code
- Pass all arguments to a function through their name and not the position
- log the errors
- avoid already used function or object names
- write simple code
- avoid using
T
,F
,<<-
&;