Chapter 3 Basic Concepts

3.1 Hello world program

Hello world program with both R command prompt and writing script.

3.1.1 R Command Prompt

Once you have R environment setup, then it’s easy to start your R command prompt by just typing the following command at your command prompt −

$ R

This will launch R interpreter and you will get a prompt > where you can start typing your program as follows −

> myString <- "Hello, World!"
> print ( myString)
[1] "Hello, World!"

Here first statement defines a string variable myString, where we assign a string “Hello, World!” and then next statement print() is being used to print the value stored in variable myString.

3.1.2 R Script File

Usually, you will do your programming by writing your programs in script files and then you execute those scripts at your command prompt with the help of R interpreter called Rscript. So let’s start with writing following code in a text file called test.R as under −

# My first program in R Programming
myString <- "Hello, World!"

print ( myString)

Save the above code in a file test.R and execute it at Linux command prompt as given below. Even if you are using Windows or other system, syntax will remain same.

$ Rscript test.R 

When we run the above program, it produces the following result.

[1] "Hello, World!"

3.1.3 Comments

Comments are like helping text in your R program and they are ignored by the interpreter while executing your actual program. Single comment is written using # in the beginning of the statement as follows −

# My first program in R Programming

R does not support multi-line comments but you can perform a trick which is something as follows −

if(FALSE) {
   "This is a demo for multi-line comments and it should be put inside either a 
      single OR double quote"
}

myString <- "Hello, World!"
print ( myString)
[1] "Hello, World!"

3.2 Variable

Generally, while doing programming in any programming language, you need to use various variables to store various information. Variables are nothing but reserved memory locations to store values. This means that, when you create a variable you reserve some space in memory.

You may like to store information of various data types like character, wide character, integer, floating point, double floating point, Boolean etc. Based on the data type of a variable, the operating system allocates memory and decides what can be stored in the reserved memory.

In contrast to other programming languages like C and java in R, the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects. The frequently used ones are − Vectors Lists - Matrices - Arrays - Factors - Data Frames

The simplest of these objects is the vector object and there are six data types of these atomic vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic vectors.

3.2.1 Data Type

  1. Logical - (TRUE, FALSE)
$ v <- TRUE 
$ print(class(v))
[1] "logical" 
  1. Numeric - (12.3, 5, 999)
$ v <- 23.5
$ print(class(v))
[1] "numeric"
  1. Integer - (2L, 34L, 0L)
$ v <- 2L
$ print(class(v))
[1] "integer"
  1. Complex - (3 + 2i)
$ v <- 2+5i
$ print(class(v))
[1] "complex"
  1. Character - (‘a’ , ‘“good”, “TRUE”, ’23.4’)
$ v <- "TRUE"
$ print(class(v))
[1] "character"
  1. Raw - (“Hello” is stored as 48 65 6c 6c 6f)
$ v <- charToRaw("Hello")
$ print(class(v))
[1] "raw" 

In R programming, the very basic data types are the R-objects called vectors which hold elements of different classes as shown above. Please note in R the number of classes is not confined to only the above six types. For example, we can use many atomic vectors and create an array whose class will become array.

3.2.2 Vectors

When you want to create vector with more than one element, you should use c() function which means to combine the elements into a vector.

# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
## [1] "red"    "green"  "yellow"
# Get the class of the vector.
print(class(apple))
## [1] "character"

3.2.3 Lists

A list is an R-object which can contain many different types of elements inside it like vectors, functions and even another list inside it.

# Create a list.
list1 <- list(c(2,5,3), 21.3, sin)

# Print the list.
print(list1)
## [[1]]
## [1] 2 5 3
## 
## [[2]]
## [1] 21.3
## 
## [[3]]
## function (x)  .Primitive("sin")
print(class(list1))
## [1] "list"

3.2.4 Matrices

A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.

# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
##      [,1] [,2] [,3]
## [1,] "a"  "a"  "b" 
## [2,] "c"  "b"  "a"

3.2.5 Arrays

While matrices are confined to two dimensions, arrays can be of any number of dimensions. The array function takes a dim attribute which creates the required number of dimension. In the below example we create an array with two elements which are 3x3 matrices each.

# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)
## , , 1
## 
##      [,1]     [,2]     [,3]    
## [1,] "green"  "yellow" "green" 
## [2,] "yellow" "green"  "yellow"
## [3,] "green"  "yellow" "green" 
## 
## , , 2
## 
##      [,1]     [,2]     [,3]    
## [1,] "yellow" "green"  "yellow"
## [2,] "green"  "yellow" "green" 
## [3,] "yellow" "green"  "yellow"

3.2.6 Factors

Factors are the r-objects which are created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.

Factors are created using the factor() function. The nlevels() functions gives the count of levels.

# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.
factor_apple <- factor(apple_colors)

# Print the factor.
print(factor_apple)
## [1] green  green  yellow red    red    red    green 
## Levels: green red yellow
print(nlevels(factor_apple))
## [1] 3

3.2.7 Data Frames

Data frames are tabular data objects. Unlike a matrix in data frame each column can contain different modes of data. The first column can be numeric while the second column can be character and third column can be logical. It is a list of vectors of equal length.

Data Frames are created using the data.frame() function.

# Create the data frame.
BMI <- 	data.frame(
   gender = c("Male", "Male","Female"), 
   height = c(152, 171.5, 165), 
   weight = c(81,93, 78),
   Age = c(42,38,26)
)
print(BMI)
##   gender height weight Age
## 1   Male  152.0     81  42
## 2   Male  171.5     93  38
## 3 Female  165.0     78  26