Best Coding Practices for R
I Introduction
CoverPage
1
Introduction
II Structure
2
Folder Structure
2.1
Organizing files
2.2
Create Projects
2.3
Naming files
2.4
Folders Based on File-Type
2.5
Creating Sub-folders
2.6
Conclusion
3
Code Structure
3.1
Create Sections
3.2
Order of Code
3.3
Indentation
3.4
Give your code a breathing room
3.5
Conclusion’
4
Functions
4.1
Metadata or Information header
4.2
Pass everything through parameters
4.3
Use Return Statement
4.4
Keep a consistency in Return Type
4.5
Use Sensible Names for parameters too…
4.6
use tryCatch
4.7
Write simple and unique functions
4.8
Don’t load libraries or source code inside a function
4.9
Use Package::Function() approach
4.9.1
You should load libraries in the order of their usage
4.10
Conclusion
5
Naming Conventions
5.1
Popular naming conventions
5.1.1
camelCase
5.1.2
PascalCase
5.1.3
snake_case
5.2
Informative Names
5.3
Conclusions
6
OOPS
6.1
What is oops
6.2
When to use it
7
Environment Management
7.1
Avoid package dependencies when possible
7.2
renv for package management
7.3
config for external dependencies
7.4
use .Renviron file
7.5
Conclusion
8
data Management
8.1
Keep a Copy or your Data
8.2
Don’t use numbers for columns
8.3
Keep Meaningful and proper column names
8.4
Use Databases
8.5
Use Efficient Packages
8.5.1
data.table
8.5.2
Matrix
8.5.3
disk.frame
8.5.4
modeldb
8.5.5
dbplot
8.5.6
sparklyr
8.6
Conclusion
9
Debugging
9.1
Write Unit Tests
9.2
Browser() and print() are your friend
9.3
Read the functions
9.4
Version Control System
9.5
Make small commits
9.6
Use curly brackets
9.7
Always use named parameters
9.8
Log the errors
9.9
Don’t Use already used names
9.10
Use Simple code
9.11
Conclusion
III Memory
10
Type System
10.1
Things you should know
10.1.1
R don’t have scalar data types
10.1.2
Dates are basically integers under the hood.
10.1.3
POSIXlt are basically lists under the hood
10.1.4
Integers are smaller than numeric
10.1.5
define your datatypes before the variable
10.1.6
lists are better than dataframe under a loop
10.1.7
use lists whenever possible
10.2
Choose data types carefully
10.3
don’t change datatypes
10.4
Future of type-system in R
10.5
Conclusion
11
Pass By Value-Reference
11.1
Understanding the system
11.1.1
Pass by Value
11.1.2
Pass by reference
11.2
Copy on modify
11.3
for pass by reference
11.4
Conclusion
12
Release Memory
12.1
use rm()
12.2
use gc()
12.2.1
R version 3.5
12.2.2
R version 4.0
12.2.3
Inside a heavy loop
12.2.4
anything that takes more than 30 seconds
12.3
Cache / Store calculations
12.4
Conclusion
IV Speed
13
Some Tips to make R code faster
13.1
Use Latest version of R
13.2
Benchmark and profiling the code
13.3
Algorithm matters more than language
13.4
Read the function
13.5
Use Conditionals to break computations
13.6
Use Faster packages
13.7
Some pointers
13.7.1
use [[ instead of [ when you can
13.7.2
R calculates everything
13.7.3
.Internal functions
13.7.4
Don’t Compile
13.7.5
use direct method.object structure
13.8
Export Other languages
13.9
Conclusion
14
For Loops
14.1
initialize objects before loops
14.2
use simple data-types
14.3
apply family
14.3.1
apply functions are not much faster than loops
14.3.2
Nested lapply have same speed as a normal lapply
14.4
Vectorize your code
14.4.1
never repeat a calculation
14.4.2
Vectorized code can do 2 or 3 steps more in lesser time
14.5
Understanding non-vectorized code
14.6
Do as little as possible inside a loop
14.6.1
Combine Vectorized code inside a loop
14.7
Conclusion
15
Multithreading
15.1
Multi Threading has an overhead
15.2
Be Cautious with Database
15.3
Use Future Package
15.4
Send Only Bigger calculation
V Architecture
16
Functional Architecture {functional}
16.1
Pure and Impure functions
16.2
Data and Method as seperate entity
16.3
functors
16.4
monoids
16.5
write packages
17
Testing {functional}
17.1
Unit Testing
17.2
Systems Testing
17.3
Load Testing
18
Deployment {functional}
18.1
Method
Published with bookdown
Best Coding Practices for R
Chapter 6
OOPS
6.1
What is oops
6.2
When to use it