5.2 OLS method in RStudio

Instead of performing matrix calculations, it’s easier to utilize the lm() command for various linear model specifications, which provides OLS estimates by default.
Command lm() has two main arguments: (1) formula and (2) data
The formula argument of the lm() command refers to the functional specification of the model and can be written in different ways

TABLE 5.5: Formula specifications in RStudio
Formula	Description
`y~x`	$y$ is regressed on $x$
`y~x+z`	$y$ is regressed on $x$ and $z$
`y~x+z+x:z`	$y$ is regressed on $x$ , $z$ and interaction term ( $x*z$ )
`y~0+z`	$y$ is regressed on $z$ without constant term
`y~1`	$y$ is regressed on constant term only
`y~log(x)`	$y$ is regressed on the $log(x)$
`scale(y)~0+scale(x)`	standardized regression without constant term

The second argument presents data frame (previously saved object within RStudio)
Sometimes argument data can be omitted if the variables exist in the global workspace environment (variables are not part of any data frame)
From estimated model various information can be extracted by applying new commands on existing object

TABLE 5.6: Extracted results from linear model in RStudio
Command	Description
`coef(model)`	regression coefficients (estimated parameters)
`confint(model,level=0.95)`	confidence intervals of coefficients
`fitted(model)`	fitted (expected) values
`resid(model)`	residuals
`vcov(model)`	covariance matrix of coefficients
`summary(model)`	basic summary of the model
`anova(model)`	analysis of variance ANOVA
`AIC(model)`	Akaike information criterion
`BIC(model)`	Bayesian information criterion
`abline(model)`	regression line on the scatter plot

Exercise 23. Using sample data from newdata object (already loaded a text file eu_countries.txt) estimate three models: (a) lin-lin, (b) log-log, and (c) second order polynomial. Assume that gdp is dependent variable

$y$ and population is independent variable

$x$ . Summarize the results of three estimated models in a single table using modelsummary() command.

Solution

Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them.

model1=lm(gdp~population,data=newdata) # Estimation of lin-lin model as an object "model1"
model2=lm(log(gdp)~log(population),data=newdata) # Estimation of log-log model as an object "model2"
model3=lm(gdp~population+I(population^2),data=newdata) # Estimation of polynomial model as object "model3"
modelsummary(list(model1,model2,model3),stars=TRUE,fmt=4) # Summarizing results from three models in a single table

$~~~$

Note that I() function is used to indicate that operations inside should be treated “as-is”, i.e. it ensures that popuation^2 is explicitly interpreted as quadratic term.
It is commonly accepted that estimated parameter is marked with one star ( $*$ ) if it is statistically significant at 10% level ( $p<0.1$ ), two stars ( $**$ ) if it is statistically significant at 5% level ( $p<0.05$ ) or three stars ( $***$ ) if it is statistically significant at 1% level ( $p<0.01$ ).
Estimates without star(s) are not statistically significant in the context of two-sided alternative (this will be discussed in the section 6.1)