Chapter 11 DataFrame

11.1 What is a Data Frame?

DataFrame is a two-dimensional labelled data structure which can have columns of multiple data types. Imagine you have a spread sheet that contains stock related information. The spreadsheet can have different columns such as “Stock Name” (Character data type), Dates (Time Data Type), Stock Price (Float Data Type) etc. This spreadsheet is a kind of dataframe. We can use data.frame function to create a dataframe in R programming. Lets create a dataframe with 3 rows and two columns

Data<- data.frame(
          stock_name=c("AACL","AAK","MANI"), 
          date=as.Date(c("12-3-2020","12-3-2020","12-3-2020")),
          Price=c(12,3,224),
          stringsAsFactors = FALSE

)

Let’s print the newly created DataFrame and add columns: -

Data
##   stock_name       date Price
## 1       AACL 0012-03-20    12
## 2        AAK 0012-03-20     3
## 3       MANI 0012-03-20   224

Let’s validate the class of newly created DataFrame: -

class(Data)
## [1] "data.frame"

Let’s explore few of the built in dataframes in r. To explore built-in/preloaded dataframe in R. Please use.

data()

One of the common dataframe that is used in R is mtcars. mtcars Motor Trend Car Road Tests

Lets use and mt cars and perform few operations in mtcars

We can use head function to see the first few rows of the data.

data(mtcars)

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

We can also specify the number of rows we want to see: -

head(mtcars,3)
##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

Similarly we can use tail function to see the last few rows of the dataframe. We can also specify the number of rows we want to see:-

tail(mtcars)
##                 mpg cyl  disp  hp drat    wt qsec vs am gear carb
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

We can also use mtcars data to perform few operations in mtcars:- Lets check mtcars’s number of columns and rows

ncol(mtcars)
## [1] 11
nrow(mtcars)
## [1] 32

For prelaoded data you can always use help function to know more about variables in DataFrame

?mtcars
## starting httpd help server ... done

We can also use summary function to get summary of the data.

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

As we can see from the results above summary gives output in terms of descriptive statistics (i.e. percentile , mean, min, max etc)

We can use another popular function ‘str’ to get to know more about the dataframe:-

str(Data)
## 'data.frame':    3 obs. of  3 variables:
##  $ stock_name: chr  "AACL" "AAK" "MANI"
##  $ date      : Date, format: "0012-03-20" "0012-03-20" ...
##  $ Price     : num  12 3 224