Best Coding Practices for R
I Introduction
CoverPage
1
Introduction
II Structure
2
Folder Structure
2.1
Organizing files
2.2
Create Projects
2.3
Naming files
2.4
Folders Based on File-Type
2.5
Creating Sub-folders
2.6
Conclusion
3
Code Structure
3.1
Create Sections
3.2
Order of Code
3.3
Indentation
3.4
Give your code a breathing room
3.5
Conclusion’
4
Functions
4.1
Metadata or Information header
4.2
Pass everything through parameters
4.3
Use Return Statement
4.4
Keep a consistency in Return Type
4.5
Use Sensible Names for parameters too…
4.6
use tryCatch
4.7
Write simple and unique functions
4.8
Don’t load libraries or source code inside a function
4.9
Use Package::Function() approach
4.9.1
You should load libraries in the order of their usage
4.10
Conclusion
5
Naming Conventions
5.1
Popular naming conventions
5.1.1
camelCase
5.1.2
PascalCase
5.1.3
snake_case
5.2
Informative Names
5.3
Conclusions
6
Environment Management
6.1
Avoid package dependencies when possible
6.2
renv for package management
6.3
config for external dependencies
6.4
Conclusion
7
data Management
7.1
Keep a Copy or your Data
7.2
Don’t use numbers for columns
7.3
Keep Meaningful and proper column names
7.4
Use Databases
7.5
Use Efficient Packages
7.5.1
data.table
7.5.2
Matrix
7.5.3
disk.frame
7.5.4
modeldb
7.5.5
dbplot
7.5.6
sparklyr
7.6
Conclusion
8
Debugging
8.1
Write Unit Tests
8.2
Browser() and print() are your friend
8.3
Read the functions
8.4
Version Control System
8.5
Make small commits
8.6
Use curly brackets
8.7
Always use named parameters
8.8
Log the errors
8.9
Don’t Use already used names
8.10
Use Simple code
8.11
Conclusion
III Memory
9
Type System
9.1
Things you should know
9.1.1
R don’t have scalar data types
9.1.2
Dates are basically integers under the hood.
9.1.3
POSIXlt are basically lists under the hood
9.1.4
Integers are smaller than numeric
9.1.5
define your datatypes before the variable
9.1.6
lists are better than dataframe under a loop
9.1.7
use lists whenever possible
9.2
Choose data types carefully
9.3
don’t change datatypes
9.4
Future of type-system in R
9.5
Conclusion
10
Pass By Value-Reference
10.1
Understanding the system
10.1.1
Pass by Value
10.1.2
Pass by reference
10.2
Copy on modify
10.3
for pass by reference
10.4
Conclusion
11
Release Memory
11.1
use rm()
11.2
use gc()
11.2.1
R version 3.5
11.2.2
R version 4.0
11.2.3
Inside a heavy loop
11.2.4
anything that takes more than 30 seconds
11.3
Cache / Store calculations
11.4
Conclusion
IV Speed
12
Some Tips to make R code faster
12.1
Use Latest version of R
12.2
Benchmark and profiling the code
12.3
Algorithm matters more than language
12.4
Read the function
12.5
Use Conditionals to break computations
12.6
Use Faster packages
12.7
Some pointers
12.7.1
use [[ instead of [ when you can
12.7.2
R calculates everything
12.7.3
.Internal functions
12.7.4
Don’t Compile
12.7.5
use direct method.object structure
12.8
Export Other languages
12.9
Conclusion
13
For Loops
13.1
initialize objects before loops
13.2
use simple data-types
13.3
apply family
13.3.1
apply functions are not much faster than loops
13.3.2
Nested lapply have same speed as a normal lapply
13.4
Vectorize your code
13.4.1
never repeat a calculation
13.4.2
Vectorized code can do 2 or 3 steps more in lesser time
13.5
Understanding non-vectorized code
13.6
Do as little as possible inside a loop
13.6.1
Combine Vectorized code inside a loop
13.7
Conclusion
14
Multithreading
V Shiny Tips
15
Speed
16
Memory
Published with bookdown
Best Coding Practices for R
Chapter 16
Memory