1 Intro to R
Every programming language has similar elements. We need to input data, identify variables, perform calculations, create functions, control program flow, and output data and graphics. The R language is attractive in that it has it has a full online development environment -RStudio Cloud- and built in functions for regression analysis, solving equations, and producing graphics. R is increasingly used for those involved in data analysis and statistics, particularly social sciences, bio-statistics, and medicine. It is currently available in a free version. References on R tend to emphasize different features of R more suitable for social sciences, data mining, ecology, and medicine. A good reference for scientists using numerical methods is available1, as well as an Analytical Chemistry textbook2. This manual will present the elements of R gradually, as needed, mostly through chemical data examples. This document itself is created with RMarkdown, which integrates a document creation with R code.
1.1 R in RStudio Cloud
We will write our programs on a online platform (an IDE) called Rstudio Cloud, which makes it platform independent - we access the program the same way on a PC, Mac, or Chromebook. The Rstudio Cloud environment is divided into 4 main sections, with the most important elements listed below.
- Top left: Script - where you write R code
- Bottom left: Console - show output
- Top right: Environment - we will mostly ignore this
- Bottom right: show Plots, help, packages
To start writing a program in Rstudio Cloud:
from a Space (like a file folder that can contain related programs)
, select New project, and from the menu:
File ——> Newfile —–> Rscript
& start typing!
To run a program, highlight the code and select “run”.
You can create multiple R scripts in a single space. Spaces are displayed on the left and you can create multiple spaces. In the tools icon (upper right) spaces can be shared.
When you run a script program, results and error messages will appear on the console, and plots appear on the plot area.
In this document, the shaded area is R code, and the white background appears as output from a previous R command.
1.2 Vectors & Numerics
In R we commonly use “<-” to set a variable to a value. R uses “vector” calculation, which avoids use of loops in more traditional programming. For instance, below we set x equal to a series of values (a vector), and then calculate a series of values of y.
# A comment line
<- 5.0 # Set x equal to 5.0
x
<- x^2 # y is equal to x squared.
y
<- c(1.0, 2.0, 3.0) # x is equal to a numbered list of values - a “vector”
x
<- seq(1,2,0.2) # create an incremented sequence
x
<-matrix(c(2,4,6,8),2,2) # a matrix row 1 = 2 6 row 2 = 4 8
xx # length(x) returns the length of the vector x.
# x returns all the values of x
length(x) # the number of values in the vector x
## [1] 6
x
## [1] 1.0 1.2 1.4 1.6 1.8 2.0
xx
## [,1] [,2]
## [1,] 2 6
## [2,] 4 8
# exponents and logs
10^(-2) # power of 10
## [1] 0.01
exp(1) # power of 2.718282 .....
## [1] 2.718282
log(1) # base e
## [1] 0
log10(10) # base 10
## [1] 1
Once we have a series of x and y, of equal length, we can easily create a graph.
<- seq(0,10,0.5) # a sequence from 1 to 10, increments 0f 0.5
x
<- x^2 # Note that y is calculated for every x. This is called vectorized.
y
plot(x,y) # Create a plot. We can add a lot of formatting!
Below is an example of basic formatting commands.
plot(x,y,type = "b",main = "A Formatted Graph",col = "darkblue", xlab = "X Label", ylab = "Y Label")
type=“p”: for points (by default) type=“l”: for lines type=“b”: for both; points are connected by a line
1.3 Functions
We can define a function and evaluate it later in the R script.
<- function(x) {x^2 + log(x)}
Afunc
<- c(1,2,3)
y
Afunc(y) # note the vectorized evaluation
## [1] 1.000000 4.693147 10.098612
1.4 Reading and Writing Data Files
With larger data sets, as in a titration, we prefer to be able to read a data file, such as output from a spreadsheet. A common filetype is csv (comma separated values) files. Usually, the first row contains the headers, and them remaining rows the values. If you upload a csv file into your working directory, they can be read into Dataframes. Dataframes are columns of values that do not have to be of the same type (Names, dates, income, etc), and they are very important in many R applications. However, here’ we are focusing on vectors, and we can convert a dataframe with two or more numeric columns (such as volume and pH) into respective vectors. The data file “mydat.csv” looks like this:
vol, pH
1.5 , 3.2
3.0, 6.6
7.0, 14.0
<- read.csv("mydat.csv") # a file with headers vol and pH Mydata
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on 'mydat.csv'
# it's a dataframe Mydata
## vol pH
## 1 1.5 3.2
## 2 3.0 6.6
## 3 7.0 14.0
<- Mydata$vol # extract column vol as vector
volume
<- Mydata$pH # extract column pH as vector
pH
volume
## [1] 1.5 3.0 7.0
pH
## [1] 3.2 6.6 14.0
# and to write a vector file
write(volume,file="newfile")
1.5 Control Structures: For Loop
Control structures allow iterative operations (for loop) and decision directed operations (if then else).
<- 2
x
for(i in 1:4) { x <- x*x}
x
## [1] 65536
1.6 Packages
So far we have been using the base R that is available by creating an R script. However, we will also make use of R packages which extend R by adding graphical or computational extensions to R. They can be installed by selecting “Packages” and “Install” in the lower right quadrant of RStudio and searching by name for the package. Examples of packages we will use are nls2 (non-linear regression), fftw (Fourier Transform), and Stats (Statistics). The use of each package will be explored in the context of its use.
1.7 Questions and Projects
Create a vector “myvec” equal to 1,5,10 … 100 with the sequence command, and create a function yval equal to myvec squared. Create a plot of yval versus myvec, with the data symbols as points.
Write the vector yval to a file. Check that the file was created.
\(pH = -log[H^+]\). Create a vector “ph” of pH values from 1 to 10. Create a function “hplus” which converts pH to hydrogen ion concentration. Call the function to calculate the hydrogen ion concentration for the series of pH values.
Find an equation with dependent variable and independent variable used in science or social science (y = f(x)). Create a reasonable range and sequence of x values, calculate the y values. Make a plot accurately labelled and with a title.
Introduction to Scientific Programming and Simulation Using R, CRC Press (2014)↩︎