4.2 RStudio – part B
So far we have learned that R can used as calculator, data can be simulated by some random number generator or inputed by hand
Data can also be imported from locally saved files or if the data file is available online it can be loaded directly into RStudio using URL link address
eu_countries.txt
directly to RStudio using URL address within read.table()
command. Save the data frame as object newdata
. Check the structure of newdata
and display the data in the Console window.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them.# Loading the data from URL
=read.table(file="http://www.efzg.hr/userdocsimages/sta/jarneric/eu_countries.txt",header=TRUE)
newdata# Checking the structure of an object
str(newdata)
# Naming rows by extracting country labels from newdata
rownames(newdata)=newdata$country
# Omitting the first column, which have already been used, by indexing inside the square brackets
=newdata[,-1]
newdatastr(newdata) # Checking the structure again
print(newdata) # Displaying the data in the console
\(~~~\)
Object
newdata
refers to a sample of \(18\) European countries (\(n=18\)) for the year \(2012\) with \(6\) variables. Each column of a data frame can be extracted as a single variable by indexing within square brackets [i, j]. Indexi
represents the rows and indexj
represents the columns. Alternatively, a single variable can be extracted from a data frame using$
symbol.File
eu_countries.txt
is a tab-deliminated text file with a dot as the decimal pointIf a CSV file is available online (comma-separated values), you should use
read.csv()
command instead ofread.table()
To import Excel files, you should use
read_excel()
command from thereadxl
packageFurthermore, data can be directly loaded into RStudio from public sources. Several packages support commands for direct loading of secondary data.
Package | Command | Description |
---|---|---|
eurostat |
get_eurostat() |
EUROSTAT data |
wbstats |
wb_data() |
World Bank data |
ecb |
get_data() |
European Central Bank data |
quantmod |
getSymbols() |
Yahoo Finance data |
OECD |
get_dataset() |
OECD data |
- To load data from EUROSTAT, you should first check the data navigation tree at https://ec.europa.eu/eurostat/data/database to locate and identify the dataset code of interest, as well as the country codes, variable/indicator codes, and other relevant identifiers.
TABLE 3.1
. Likewise, load income data (wages and salaries) for the same countries in the same year. Display a scatter plot with consumption on the y-axis and income on the x-axis (both variables measured in millions of EUR, current prices). Display a second scatter plot with both variables transformed into logs.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. Variablesconsumption
and income
will appear as two numeric arrays in the workspace Environment, while both scatter plots will be displayed in the Plots pane. Note: using a logarithmic scale on both axes (as in the second scatter plot) helps visualize the linear relationship between two variables, making it clearer and easier to interpret.
install.packages("eurostat") # Installing the eurostat package (required only once)
library(eurostat) # Loading the package from the library
# Loading consumption data from EUROSTAT with appropriate filters
=get_eurostat("nama_10_gdp",filters=list(geo=c("BG","CZ","EE","HR","IT","LV","HU","PL","RO","SI"),na_item="P3",time=2023,freq="A",unit="CP_MEUR"),cache=FALSE)
consumption=consumption$values # Extracting only the values of consumption
consumption# Loading income data from EUROSTAT with appropriate filters
=get_eurostat("nama_10_gdp",filters=list(geo=c("BG","CZ","EE","HR","IT","LV","HU","PL","RO","SI"),na_item="D11",time=2023,freq="A",unit="CP_MEUR"),cache=FALSE)
income=income$values # Extracting only the values of income
income# First scatter plot
plot(income,consumption,pch=19,col="blue",main="Scatter plot (1)")
# Second scatter plot
plot(log(income),log(consumption),pch=19,col="blue",main="Scatter plot (2)")
\(~~~\)
- The argument
cache=FALSE
enables to download the latest version of the data directly from EUROSTAT, which may take longer to execute, especially if the dataset is large
quantmod
package. Import daily Apple stock data from Yahoo Finance for the period from January 1, 2023 to December 31, 2023 (one year). Display the first few rows and the last few rows of imported data. Extract the closing prices into an object prices
and plot them. Calculate the daily returns based on the closing prices. Plot returns
by a histogram. Install and load the moments
package, which supports commands for skewness and kurtosis. Use datasummary()
command to display summary statistics for the daily returns: minimum, maximum, mean, standard deviation, skewness, and kurtosis. Finally, plot the daily returns again using a histogram with relative frequencies and add a normal curve to the same plot.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them.# Installing and loading quantmod package
install.packages("quantmod")
library(quantmod)
# Loading Apple stock data from Yahoo Finance source
getSymbols("AAPL",src="yahoo",from="2023-01-01",to="2023-12-31")
head(AAPL) # Printing the first 6 rows
tail(AAPL) # Printing the last 6 rows
=AAPL$AAPL.Close # Extracting closing prices only
pricesplot(prices,main="Daily closing prices") # Plotting the closing prices
=diff(log(prices)) # Calculating daily returns as first differences of the logs
returnsplot(returns,main="Daily returns") # Line plot of daily returns
hist(returns) # Histogram of daily returns
# Installing and loading moments package
install.packages("moments")
library(moments)
# Displaying the summary statistics of daily returns
datasummary(min+max+mean+sd+skewness+kurtosis~AAPL.Close,data=data.frame(returns[-1,]),fmt=4)
# Plotting a histogram with normal curve
hist(returns,prob=TRUE,main="Histogram of daily Apple returns")
curve(dnorm(x,0.0017,0.0125),col="red",add=TRUE)
\(~~~\)
The
quantmod
package depends on several other packages and R will automatically install these dependencies if they are not already installedAlong with data frames, vectors, and matrices, you can work with other types of objects in R, such as arrays, lists or time-series objects (xts or zoo). While a matrix is similar to a data frame, it does not have column names or row names.
Each column in a data frame is a vector of the same length, but unlike a matrix, the columns of a data frame can hold different data types. In contrast, all columns in a matrix must be numeric.
Knowing specific commands in RStudio for working with vectors and matrices is extremely useful for data manipulation, including transforming, reshaping, and aggregating the data
Commands | Description |
---|---|
c(a,b,c,d,...) |
vector with elements \(a\), \(b\), \(c\), \(d\), … |
seq(n) |
sequence from \(1\) to \(n\) |
seq(a:n) |
sequence from \(a\) to \(n\) |
seq(a,n,c) |
sequence from \(a\) to \(n\) in steps \(c\) |
rep(a,n) |
vector with \(n\) equal elements \(a\) |
length(v) |
number of elements in vector \(v\) |
sum(v) |
sum of the elements of vector \(v\) |
prod(v) |
product of the elements of vector \(v\) |
Commands | Description |
---|---|
matrix(v, nrow=n) |
matrix with elements of \(v\) in \(n\) rows |
cbind(c1, c2, ...) |
combines more columns into a matrix |
rbind(r1, r2, ...) |
combines more rows into a matrix |
diag(A) |
extracts diagonal elements of a matrix \(A\) |
diag(n) |
creates identity matrix with dimensions \(n \times n\) |
t(A) |
transposes matrix \(A\) |
solve(A) |
inverse of a matrix \(A\) |
A %*% B |
multiplication of two matrices \(A\) and \(B\) |
dim(A) |
dimensions of a matrix \(A\) |
Exercise 20. Create a vector \(b=\begin{bmatrix} 2 \\ 3\\ 4 \end{bmatrix}\), calculate the inner product of that vector \(b^{T}b\) and show that is equal to the sum of squares \(\displaystyle\sum_{i=1}^3 b^2_i\).
Create another, identity vector \(c\) with length of \(3\) and combine it with vector \(b\) into a matrix \(C\). What are the elements of the inner product of matrix \(C\)? Calculate and store the outer product of vector \(bb^{T}\) as a new object \(D\). Check dimensions of object \(D\)?Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. Note: R is case-sensitive, e.g.c
is a vector and C
is a matrix (not the same objects). In the given example, the inner product of a vector is a single number that represents the sum of the squares (by multiplying each element of the row by the same element of the column and summing those products). The outer product of the vector results in a matrix, not a single number. It is computed by multiplying each element of the column by each element of the row.
=c(2,3,4) # Creates a vector by c() function
bt(b)%*%b # Multiplies a row with a column of the same vector b
sum(b^2) # Calculates the sum of squares
=rep(1,3) # Creates identity vector c with length of 3
c=cbind(c,b) # Combines two vectors into matrix C
C# Outputs the matrix C in the console ... the same as print(C) command
C t(C)%*%C # Calculates an inner product of the matrix C
=b%*%t(b) # Multiplies a column with a row of the same vector b
D# Outputs a matrix D in the console
D dim(D) # Checks dimensions of the matrix D
\(~~~\)