4.2 RStudio – part B

  • So far we have learned that R can used as calculator, data can be simulated by some random number generator or inputed by hand

  • Data can also be imported from locally saved files or if the data file is available online it can be loaded directly into RStudio using URL link address

Exercise 17. Import the data from text file eu_countries.txt directly to RStudio using URL address of the file within read.table() command. Store the data frame as object newdata. Check the structure of newdata and display the data in the console.

# Loading the data from URL
newdata=read.table(file="http://www.efzg.hr/userdocsimages/sta/jarneric/eu_countries.txt",header=TRUE)
# Checking the structure of an object
str(newdata)
# Naming rows by extracting country labels from newdata
rownames(newdata)=newdata$country
# Omitting the first column, which have already been used, by indexing inside the square brackets
newdata=newdata[,-1]
str(newdata) # Checking the structure again
print(newdata) # Displaying the data in the console

\(~~~\)

  • Object newdata refers to a sample of \(18\) European countries (\(n=18\)) for the year \(2012\) with \(6\) variables. Each column of a data frame can be extracted as a single variable by indexing with square brackets [i, j]. Index i represents the rows and index j represents the columns. Alternatively, a single variable can be extracted from a data frame using $ symbol.

  • File eu_countries.txt is a tab-deliminated text file with a dot as the decimal point

  • If a CSV file is available online (comma-separated values), you should use read.csv() command instead of read.table()

  • To import Excel files, you should use read_excel() command from the readxl package

  • Furthermore, data can be directly loaded into RStudio from public sources. Several packages support commands for direct loading of secondary data.

TABLE 4.1: Loading data from public sources
package command description
eurostat get_eurostat() EUROSTAT data
WDI WDI_data() World Bank data
ecb get_data() European Central Bank data
quantmod getSymbols() Yahoo Finance data
OECD get_dataset() OECD data
  • To load data from EUROSTAT, you should first check the data navigation tree at https://ec.europa.eu/eurostat/data/database to locate and identify the dataset code of interest, as well as the country codes, variable/indicator codes, and other relevant identifiers.

Exercise 18. Load consumption data for the year \(2023\) from EUROSTAT for the same \(10\) countries as in TABLE 3.1. Likewise, load income data (wages and salaries) for the same countries in the same year. Display a scatter plot with consumption on the y-axis and income on the x-axis (both variables measured in millions of EUR, current prices). Display a second scatter plot with both variables transformed into logs.

install.packages("eurostat") # Installing the eurostat package (required only once)
library(eurostat) # Loading the package from library
# Loading consumption data from EUROSTA with appropriate filters
consumption=get_eurostat("nama_10_gdp",filters=list(geo=c("BG","CZ","EE","HR","IT","LV","HU","PL","RO","SI"),na_item="P3",time=2023,freq="A",unit="CP_MEUR"),cache=FALSE)
consumption=consumption$values # Extracting only the values of consumption
# Loading income data from EUROSTA with appropriate filters
income=get_eurostat("nama_10_gdp",filters=list(geo=c("BG","CZ","EE","HR","IT","LV","HU","PL","RO","SI"),na_item="D11",time=2023,freq="A",unit="CP_MEUR"),cache=FALSE)
income=income$values # Extracting only the values of income
# First scatter plot
plot(income,consumption,pch=19,col="blue",main="Scatter plot (1)")
# Second scatter plot
plot(log(income),log(consumption),pch=19,col="blue",main="Scatter plot (2)")

\(~~~\)

  • The argument cache=FALSE enables to download the latest version of the data directly from EUROSTAT, which may take longer to execute, especially if the dataset is large

Exercise 19. Install and load the quantmod package. Import daily Apple stock data from Yahoo Finance for the period from January 1, 2023 to December 31, 2023 (one year). Display the first few rows and the last few rows of imported data. Extract the closing prices into an object prices and plot them. Calculate the daily returns based on the closing prices. Plot returns by a histogram. Install and load the moments package, which supports commands for skewness and kurtosis. Use datasummary command to display summary statistics for the daily returns: minimum, maximum, mean, standard deviation, skewness, and kurtosis. Finally, plot the daily returns again using a histogram with relative frequencies and add a normal curve to the same plot.

# Installing and loading quantmod package
installed.packages("quantmod")
library(quantmod)
# Loading Apple stock data from Yahoo Finance source
getSymbols("AAPL",src="yahoo",from="2023-01-01",to="2023-12-31")
head(AAPL) # Printing the first 6 rows
tail(AAPL) # Printing the last 6 rows
prices=AAPL$AAPL.Close # Extracting closing prices only
plot(prices,main="Daily closing prices") # Plotting the closing prices
returns=diff(log(prices)) # Calculating daily returns as first differences of the logs
plot(returns,main="Daily returns") # Line plot of daily returns
hist(returns) # Histogram of daily returns
# Installing and loading moments package
install.packages("moments")
library(moments)
# Displaying the summary statistics of daily returns
datasummary(min+max+mean+sd+skewness+kurtosis~AAPL.Close,data=data.frame(returns[-1,]),fmt=4)
# Plotting a histogram with normal curve
hist(returns,prob=TRUE,main="Histogram of daily Apple returns")
curve(dnorm(x,0.00173,0.01255),col="red",add=TRUE)

\(~~~\)

  • The quantmod package depends on several other packages and R will automatically install these dependencies if they are not already installed

  • Along with data frames, vectors, and matrices, you can work with other types of objects in R, such as arrays, lists or time-series objects (xts or zoo). While a matrix is similar to a data frame, it does not have column names or row names.

  • Each column in a data frame is a vector of the same length, but unlike a matrix, the columns of a data frame can hold different data types. In contrast, all columns in a matrix must be numeric.

  • Knowing specific commands in RStudio for working with vectors and matrices is extremely important for data manipulation, including transforming, reshaping, and aggregating the data

TABLE 4.2: Working with vectors
commands description
c(a,b,c,d,...) vector with elements \(a\), \(b\), \(c\), \(d\), …
seq(n) sequence from \(1\) to \(n\)
seq(a:n) sequence from \(a\) to \(n\)
seq(a,n,c) sequence from \(a\) to \(n\) in steps \(c\)
rep(a,n) vector with \(n\) equal elements \(a\)
length(v) number of elements in vector \(v\)
sum(v) sum of the elements of vector \(v\)
prod(v) product of the elements of vector \(v\)
TABLE 4.3: Working with matrices
commands description
matrix(v, nrow=n) matrix with elements of \(v\) in \(n\) rows
cbind(c1, c2, ...) combines more columns into a matrix
rbind(r1, r2, ...) combines more rows into a matrix
diag(A) extracts diagonal elements of a matrix \(A\)
diag(n) creates identity matrix with dimensions \(n \times n\)
t(A) transposes matrix \(A\)
solve(A) inverse of a matrix \(A\)
A %*% B multiplication of two matrices \(A\) and \(B\)
dim(A) dimensions of a matrix \(A\)

Exercise 20. Create a vector \(b=\begin{bmatrix} 2 \\ 3\\ 4 \end{bmatrix}\), calculate the inner product of that vector \(b^{T}b\) and show that is equal to the sum of squares \(\displaystyle\sum_{i=1}^3 b^2_i\).

Create another, identity vector \(c\) with length of \(3\) and combine it with vector \(b\) into a matrix \(C\). What are the elements of the inner product of matrix \(C\)? Calculate and store the outer product of vector \(bb^{T}\) as a new object \(D\). Check dimensions of object \(D\)?

b=c(2,3,4) # Creates a vector by generic function c()
t(b)%*%b # Multiplies a row with a column of the same vector b
sum(b^2) # Calculates the sum of squares
c=rep(1,3) # Creates identity vector c with length of 3
C=cbind(c,b) # Combines two vectors into matrix C
C # Outputs the matrix C in the console ... the same as print(C) command
t(C)%*%C # Calculates an inner product of the matrix C
D=b%*%t(b)  # Multiplies a column with a row of the same vector b
D # Outputs a matrix D in the console
dim(D) # Checks dimensions of the matrix D

\(~~~\)

  • Note that R is case-sensitive, e.g. c is a vector and C is a matrix (not the same objects)