4.1 RStudio – part A
There is also a cloud version of RStudio called Posit Cloud, which allows users to run RStudio in a web browser without needing to install anything locally
In 2022, RStudio rebranded as Posit, reflecting its expansion and integration with Python
RStudio cloud version is also free and available on this link https://posit.cloud/
The easiest way to access Posit Cloud is to log in with Google account (even if you are a first time user)
After running RStudio (on your PC or web browser) the user interface will look similar to the next screen
Unnamed file in the upper right corner Untitled1 is newly opened and empty R Script in which you write commands by hand (File -> New File -> R Script)
Arguments for each command are written inside the round brackets or parentheses ( )
Command is computed by selecting it with the mouse and clicking the Run button or by using the shortcut Ctrl+Enter at the end of the command line
It is useful to write a comments (lines beginning with
#
) which are ignored by R. Comments are useful for making short notes in explaining your codes.In the lower left corner a Console window prints the results as well as warnings and errors (for example, the syntax of the command is incorrectly written or the specific command/object is not found)
Right side window panes serve to track the steps of the analysis, including History, Plots, Help, Packages (shows the list of currently installed packages in the library), etc.
Exercise 11. Do simple calculations in RStudio such as \(~~8\times3\), \(~~\pi/2\), \(~~e^3\), \(~~\sqrt4\), \(~~5+4\), \(~~\log10\), \(~6^2\).
# Some simple calculations
8*3
/2
piexp(3)
sqrt(4)
5+4
log(10)
6^2
\(~~\)
Exercise 12. Generate \(1000\) random numbers from standard normal distribution (with zero mean and unit standard deviation), and save them as an object named a
. Create a histogram of the same object a
. Use fixed starting point of RNG (Random Number Generator).
# Generating random numbers and plotting them by histogram
set.seed(123)
=rnorm(1000,0,1)
ahist(a,main="Histogram of generated random numbers")
\(~~\)
First command
set.seed()
is useful for creating reproducible random numbers (if omitted random numbers would differ every time you run the command in the second line)In the second line
rnorm()
command generates random numbers from a normal distribution, and these numbers are assigned to object (variable)a
. This allows you to reuse the generated random numbers later in your code. Instead of=
assignment operator<-
also can be used.The
hist()
command creates a histogram of the data, while argumentmain
is used to set the title of the histogram within quotes" "
.
Exercise 13. Plot object a
again by a histogram but with a relative frequencies (probabilities), and add a normal curve to the same plot.
# Adding a normal curve with red color to the histogram
hist(a,prob=TRUE,main="Histogram with a normal curve",xlab="generated random numbers")
curve(dnorm(x,0,1),col="red",add=TRUE)
\(~~~\)
- It is possible to combine more commands in a single line, i.e.
dnorm()
command is used insidecurve()
command.
Exercise 14. Calculate the mean and standard deviation of an object a
. Provide statistical summary of the same object including minimum, maximum, median, mean, and quartiles.
# Mean, standard deviation and statistical summary
mean(a)
sd(a)
summary(a)
\(~~~\)
- To summarize the results in well-formatted and customizable tables an additional package
modelsummary
should be installed and loaded from the library. This package supportsdatasummary()
command to visualize descriptive statistics andmodelsummary()
command to visualize econometric model output.
Exercise 15. After installing and loading package modelsummary
, use command datasummary()
to present the mean and standard deviation of on object a
in a well-formatted table. Convert numeric object a
into data frame named dt
.
# Statistical summary by datasummary() command
install.packages("modelsummary")
library(modelsummary)
=data.frame(a)
dtdatasummary(a~min+max+mean+sd,data=dt,fmt=4)
\(~~~\)
New object
dt
is a data frame with one column. Commanddatasummary()
requires data to be of typedata frame
. Data frame, in general, may contain multiple columns of different types (numeric, character, factor or integer). Rows and columns of any data frame can be named/renamed, which helps in referencing to specific data in later usage.Once installed, packages do not need to be reinstalled, but they must be loaded from the library each time you use them in a new session.
The first argument of command
datasummary()
is two-sidedformula
which uses~
symbol, while last argumentfmt
controls the number of digits.
Exercise 16. Considering cross-sectional data from TABLE 3.1
, assign life expectancy to variable y
and poverty rate to variable x
by hand. Create a scatter plot of these two variables. Merge both variables into another data frame called mydata
. Rename the columns of mydata
object as life
and poverty
. Assign countries names to the rows. Report descriptive statistics of both variables in a single table using command datasummary()
.
# Inputing data by hand as variables y and x
=c(71.4,77.2,77.2,76.7,82.7,73.1,74.3,75.5,72.8,80.7)
y=c(31.7,10.7,22.2,20.9,25.2,26.1,19.4,16.8,34.4,13.2)
x# Creating scatter plot with blue solid points and labeling the x-axis and y-axis
plot(x,y,main="Scatter plot",xlab="poverty rate",ylab="life expectancy",pch=19,col="blue")
# Merging both variables into another data frame with named columns and rows
=data.frame(y,x)
mydatacolnames(mydata)=c("life","poverty")
rownames(mydata)=c("Bulgaria","Czech","Estonia","Croatia","Italy","Latvia","Hungary","Poland","Romania","Slovenia")
# Descriptive statistics of both variables
datasummary(life+poverty~min+max+mean+sd,data=mydata,fmt=4)
\(~~~\)
Save the R script with an arbitrary name of your choice (File -> Save as \(\dots\))
You can load a saved script into RStudio at any time, and the commands in that script will be recomputed by selecting them again and clicking the Run button
According to the RStudio settings, all scripts (with extension .R) are saved in your working directory; usually My Documents or Desktop
getwd() # Information about your current working directory
setwd("new path") # Working directory can be changed by setting a new path