4 Loops and Functions
For loops and functions form the core of the skills you will need to solve many of your programming challenges in R.
Loop — A loop is what is sounds like, such that when programming a ‘loop’ you are telling the computer to recursively iterate through a set of steps outlined within the loop.
- Loops can contain functions, calls to data, create data, and contain other ‘nested’ loops.
- In R a ‘for loop’ begins with the syntax
for(){}
such that the user specifies the concept of iteration in the()
and the steps within the loop{}
. - Loops are useful for solving all types of programming and statistical problems, such as iterative
ifelse
statements, decomposing data, and serving helper functions over wide datas ets. - Loops in R are resource intensive and slow comparatively to other programming languages (like C++), and this computational speed should be considered when writing complicated for loops.
Function — A function can be thought of as a self contained ‘program’ of sorts. A function takes an input and performs the operations defined in the function on the input, returning the output to the user.
- A function can contain other functions, loops, and complex logic.
- To call a function, the syntax in R is always the same —
functionName()
. - Base R and R’s various libraries can be thought of as generally consisting of ‘functions’ — for example, the caret package contains the
train()
function used to perform various machine learning modeling tasks. - When considering if you need to write your own function answer the question - ‘Will I need to use this code, in the same way, many times?’ - If the answer is yes, then you should write a function, if the answer is no, then you could write your own function, but it may not be necessary to do so.
4.1 Loops
Basic loops:
— The following syntax shows the basic structure of a for loop in R. In this loop we iterate 5
times with our iteration index i
serving as the input for our treatment of the created variable x
. We then print
the output of the loop at each iteration.
for(i in 1:5){
<- i^2
x print(x)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
Loops inside loops:
— These are nested for loops, such that for each iteration of the outer loop, the inner loop completes a full cycle of its specified iteration count.
# set p
<- c()
p
for(i in 1:20){
<- log(i)
x
for(j in 1:10){
<- (x * j)
y <- rbind(p, y)
p
}
ifelse(i > 15, print(i), NA)
}
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
plot(p)
Value loops:
— A ‘value’ loop is a for loop that instead of iterating through a sequence of numbers instead iterates through a series of ‘string’ values.
# Create list names you want loop through
<- names(df[3:25])
namestoloop <- c()
coefs
for(value in namestoloop){
<- as.formula(paste(value, "~ x1", sep = " "))
form <- lm(form, data=df)
fit <- fit$coefficients
beta <- rbind(coefs, beta)
coefs }
4.2 Functions
Single-line function:
— A single-line function is a syntax in R where you can write a function on a single line without the bracket syntax of a multi-line function. Here we create the Sigmoid function, \(S = \frac{1}{1+e^{-x}}\), in a single line.
<- function(x) 1/(1 + exp(1)^(-x))
singleLine.func singleLine.func(5)
## [1] 0.9933071
Multi-Line function:
— A multi-line function follows the syntax functionName(input){operation return}
and is how you will generally write most functions. In this silly example, we take three inputs, create an Gaussian \(\epsilon\) term, perform an algebraic function on the variable y
and return y
for the set of all inputs.
set.seed(1234)
<- function(x, beta, lambda){
multiLine.func <- rnorm(1)
epsilon = x*beta + lambda + epsilon
y return(y)
}
<- runif(50)*10
x <- rbeta(50, 5, 2)
beta <- rnorm(50)
lambda
<- multiLine.func(x, beta, lambda)
out.y plot(out.y)
We can also use a function to transform another function — such as using our single and multiple line functions together
plot(
singleLine.func(
multiLine.func(x, beta, lambda)
),ylab = "sigmoid_y")
4.2.1 Helper Functions
Another way to use functions in R is to create ‘helper’ type functions. These functions perform a simple, computationally inexpensive tasks that you typically will use while doing more complex tasks. Examples of helper functions are the normalization and standardization functions below perform a mathematical operation on data. Other helper function such as Euclidean Distance
or Great Circle
perform a specific calculation from data. The difference between these types of helper functions is what we want from the input — e.g. do we want to transform the input or do we want to know something from the input. A third type of function, such as a Binary Gap
function tells us something about our data. In any case, understanding what we want from our data before writing your function is essential.
Normalization function: \(N = \frac{X - X_{min}}{X_{max} - X_{min}}\)
<- function(x){
normalizeMe <- (x - min(x))/(max(x) - min(x))
n return(n)
}
<- normalizeMe(out.y)
norm.y plot(norm.y)
Standardization Function: \(s = \frac{x-\mu}{\sigma}\)
<- function(x){
standardizeMe <- mean(x)
mu <- sd(x)
sigma <- (x - mu)/sigma
s return(s)
}
<- standardizeMe(out.y)
stand.y plot(stand.y)
4.2.2 Functional Programming
Functional programming is a map of values to other values and is declarative, meaning you are telling the computer the logic you want it to perform. Functional Programming separates the logic from the data, unlike object oriented programming - where an object is an abstraction of logic and data. R functions, are fundamentally a functional programming style.
4.3 Data Manipulation
Unlist a column of type list
in a data frame:
- df = a data frame
- idVar = identification variable
- listVar = a column in the data frame with a list of data nested at each row
<- data.frame()
fill.df <- c()
out <- nrow(df)
nscan
for(i in 1:nscan){
<- df$idVar[i]
id
<- df$listVar[[i]][1] # the first element of the list in the i'th row
V1 <- df$listVar[[i]][2] # the second element of the list in the i'th row
V2 <- df$listVar[[i]][3]
V3
...<- df$listVar[[nrow]][k] # the k'th element of the list in the n'th row
Vk
<- cbind(id, V1, V2, V3, ..., Vk)
out <- rbind(fill.df, out)
filt.df }
Of course it is easier to just write…
<- df$listVar
listVar <- as.data.frame(t(sapply(listVar, `length<-`, max(lengths(listVar)))))
listMerge $id = df$id
listMerge<- merge(df[,-c("listVar")], listMerge, by="id") dfSanListVar
… But you get the idea. Loops can be use to do any sort of iterative task.
4.4 Example Code
Binary Gap Function:
— Find the largest gap between 1
’s in the binary string of a number.
<- 275610687
N
<- function(Number){
binarygap <- as.integer(intToBits(Number))
x <- c()
s <- 0
m
for(i in 1:31){
if(x[i] + x[i+1] == 2){
<- 0
m else if(x[i] + x[i+1] == 1){
} <- 1
m else if(x[i] + x[i+1] == 0){
} <- m + 1
m
}<- rbind(s, m)
s
}
return(max(s))
}
print(as.integer(intToBits(N)))
## [1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 1 0 1 1 0 1 1 0 0 0 0 0 1 0 0 0
binarygap(N)
## [1] 5
Gradient Decent Function:
— Perform gradient decent optimization to find the global minimum of a function.
# Make Gradient Descent Function
<- function(f, fprime, start, delta, alpha, precision, max.iters, verbose=FALSE){
grad.des
# Create Empty Sets
<- c()
x_list <- c()
f_list <- c()
fp = 0
n = 0
x_0 = start
x_1
# While loop containing dual descent trajectories based on different alpha terms |
# Trajectory 1 set by user for larger steps until algorithm approaches epsilon |
# Trajectory 2 set by algorithm as alpha = 1/n until precision level reached |
while(abs(fprime(x_0, delta)) >= precision & n <= max.iters-1){
if(abs(fprime(x_0, delta)) >= 0.1){
<- n + 1
n
<- x_1
x_0 <- x_0 - alpha*fprime(x_0, delta)
x_1
}else{
<- n + 1
n
<- x_1
x_0 <- x_0 - ((1/n)*fprime(x_0, delta))
x_1
}
<- x_1
x_list[[n]] <- f(x_1)
f_list[[n]] <- fprime(x_0, delta)
fp[[n]]
}
<- list(N=n, x_star=x_list[[n]], f_star=f_list[[n]], fp_star=fp[[n]],
gd.out x=x_list, f=f_list, fp=fp)
return(gd.out)
}
# Random search of start points within values |
# User defines the size and scope of starting set |
# All options from the gradient descent algorithm are maintained |
<- function(lower.b, upper.b, size.b,
rand.s
f, fprime, delta, alpha,verbose=TRUE){
precision, max.iters,
# Random numbers generated inside user defined space
<- runif(size.b, min=lower.b, max=upper.b)
d.search
<- data.frame()
fs.out <- c()
fs_list <- c()
val_list
# Loop over gradient descent function for values in random search |
# Output is the minimum value for f(x) for each iteration and |
# the start value for that iteration |
for(value in d.search){
<- grad.des(f=f, fprime=fprime, start=value, delta=delta,
gd.out alpha=alpha, precision=precision, max.iters=max.iters)
<- c(fs_list, gd.out$f_star)
fs_list <- c(val_list, value)
val_list
}
<- data.frame(startvalue=val_list, f_min=fs_list)
fs.out return(fs.out)
}
# Random search function call
<- rand.s(lower.b=-2, upper.b=4, size.b=10000, f=f1, fprime=f.prime,
fs.out delta=0.001, alpha=0.06, precision=0.05, max.iters=100, verbose=FALSE)
Loop Wrapper:
— Perform a Naive Bayes prediction model over 86 outcome variables.
<- data.frame(id = data_train$id)
wrapper_out <- names(data_train[,4:89])
namestoiterate
for(i in 1:86){
<- paste(namestoiterate[i], "Var1", sep="~")
z1 <- as.formula(z1)
mod_t <- bayesglm(mod_t,
fit_t family = quasibinomial(link="logit"),
data=data_train,
weights = data_train$weight)
<- se.coef(fit_t)
x_t <- sigma.hat(fit_t)
y_t <- predict(fit_t, newdata=data_surv, type="response")
notpred_t <- data.frame( prior = notpred_t )
prior_t
<- paste(namestoiterate[i], "Var1 + Var2 + Var3 + Var4 + Var5 + Var6", sep="~")
z2 <- as.formula(z2)
p_t_mod <- bayesglm(p_t_mod, family = quasibinomial(link="logit"),
p_t_fit data = data_train,
prior.mean = mean(prior_t$prior, na.rm = TRUE),
prior.scale = y_t,
maxit = 1000,
weights = data_train$weight)
<- data.frame(id = data_test$id,
p_t pred = predict(p_t_fit, newdata = data_test, type="response"))
names(p_t)[2] <- namestoiterate[i]
<- merge(wrapper_out, p_t, by="VUID")
wrapper_out
}
Ising Model using a Gibbs Sampler:
<- 1000 # number of iterations
nscan # some square matrix of data that contains noise about the true image
<- rbind(
y c( 1, 0, 0, 0, 1 ),
c( 0, 1, 1, 1, 0 ),
c( 0, 0, 1, 1, 0 ),
c( 0, 1, 0, 0, 1 ),
c( 1, 0, 0, 0, 1 ) )
<- nrow(y) # number of rows/columns in matrix
kk <- matrix(1,kk,kk) # an uninformative prior for the sampler to start from
x
# A Gibbs sampler (An MCMC algorithm)
<- 0.5 # tuning parameter
phi <- 0.1 # tuning parameter
eps
<- matrix(0,kk,kk) # create placeholder for transition matrix
pone <- numeric(nscan) # create placeholder to save value of x
sumx
# Ising Model in Gibbs Sampler
for( iscan in 1:nscan )
{for( i in 1:kk )
{for( j in 1:kk )
{# find relevant information from neighbors (nei)
<- 0
nei if( i>1 ){ nei <- nei + x[i-1,j] }
if( i<kk ){ nei <- nei + x[i+1,j] }
if( j>1 ){ nei <- nei + x[i,j-1] }
if( j<kk ){ nei <- nei + x[i,j+1] }
# Update by the simple full conditional (fc)
<- exp(phi*nei + y[i,j]*log(1-eps) + (1-y[i,j])*log(eps) )
fc.plus <- exp(-phi*nei + y[i,j]*log(eps) + (1-y[i,j])*log(1-eps) )
fc.minus
<- fc.plus/(fc.plus+fc.minus)
p <- ifelse( runif(1) <= p, +1, -1 )
x[i,j]
}<- sum(x)
sumx[iscan]
}
# update posterior mean approximation
<- pone + (x==1)/nscan
pone print(iscan)
}