Chapter 15 Panel Data Models
rm(list=ls()) #Removes all items in Environment!
library(plm)
library(tseries) # for `adf.test()`
library(dynlm) #for function `dynlm()`
library(vars) # for function `VAR()`
library(nlWaldTest) # for the `nlWaldtest()` function
library(lmtest) #for `coeftest()` and `bptest()`.
library(broom) #for `glance(`) and `tidy()`
library(PoEdata) #for PoE4 datasets
library(car) #for `hccm()` robust standard errors
library(sandwich)
library(knitr) #for `kable()`
library(forecast)
library(systemfit)
library(AER)
library(xtable)
New package: plm
(Croissant and Millo 2015).
Panel data gathers information about several individuals (cross-sectional units) over several periods. The panel is balanced if all units are observed in all periods; if some units are missing in some periods, the panel is unbalanced. Equation \ref{eq:panelgeneq15} gives the form of a pooled panel data model, where the subscript \(i=1,...,N\) denotes an individual (cross sectional unit), and \(t=1,...,T\) denotes the time period, or longitudinal unit. The total number of observations in the panel is \(N\times T\).
\[\begin{equation} y_{it}=\beta_{1}+\beta_{2}x_{2it}+...+\beta_{K}x_{Kit}+e_{it} \label{eq:panelgeneq15} \end{equation}\]15.1 Organizing the Data as a Panel
A wide panel has the cross-sectional dimension (\(N\)) much larger than the longitudinal dimension (\(T\)); when the opposite is true, we have a long panel. Normally, the same units are observed in all periods; when this is not the case and each period samples mostly other units, the result is not a proper panel data, but pooled cross-sections model.
This manual uses the panel data package plm()
, which also gives the possibility of organizing the data under the form of a panel. Panel datsets can be organized in mainly two forms: the long form has a column for each variable and a row for each individual-period; the wide form has a column for each variable-period and a row for each individual. Most panel data methods require the long form, but many data sources provide one wide-form table for each variable; assembling the data from different sources into a long form data frame is often not a trivial matter.
The next code sequence creates a panel structure for the dataset \(nls\_panel\) using the function pdata.frame
of the plm
package and displays a small part of this dataset. Please note how the selection of the rows and columns to be displayed is done, using the compact operator \(\%in\%\) and arrays such as c(1:6, 14:15)
. Table 15.1 shows this sample.
library(xtable)
data("nls_panel", package="PoEdata")
nlspd <- pdata.frame(nls_panel, index=c("id", "year"))
smpl <- nlspd[nlspd$id %in% c(1,2),c(1:6, 14:15)]
tbl <- xtable(smpl)
kable(tbl, digits=4, align="c",
caption="A data sample")
id | year | lwage | hours | age | educ | union | exper | |
---|---|---|---|---|---|---|---|---|
1-82 | 1 | 82 | 1.8083 | 38 | 30 | 12 | 1 | 7.6667 |
1-83 | 1 | 83 | 1.8634 | 38 | 31 | 12 | 1 | 8.5833 |
1-85 | 1 | 85 | 1.7894 | 38 | 33 | 12 | 1 | 10.1795 |
1-87 | 1 | 87 | 1.8465 | 40 | 35 | 12 | 1 | 12.1795 |
1-88 | 1 | 88 | 1.8564 | 40 | 37 | 12 | 1 | 13.6218 |
2-82 | 2 | 82 | 1.2809 | 48 | 36 | 17 | 0 | 7.5769 |
2-83 | 2 | 83 | 1.5159 | 43 | 37 | 17 | 0 | 8.3846 |
2-85 | 2 | 85 | 1.9302 | 35 | 39 | 17 | 0 | 10.3846 |
2-87 | 2 | 87 | 1.9190 | 42 | 41 | 17 | 1 | 12.0385 |
2-88 | 2 | 88 | 2.2010 | 42 | 43 | 17 | 1 | 13.2115 |
Function pdim()
extracts the dimensions of the panel data:
pdim(nlspd)
## Balanced Panel: n=716, T=5, N=3580
15.2 The Pooled Model
A pooled model has the specification in Equation \ref{eq:panelgeneq15}, which does not allow for intercept or slope differences among individuals. Such a model can be estimated in \(R\) using the specification pooling
in the plm()
function, as the following code sequence illustrates.
wage.pooled <- plm(lwage~educ+exper+I(exper^2)+
tenure+I(tenure^2)+black+south+union,
model="pooling", data=nlspd)
kable(tidy(wage.pooled), digits=3,
caption="Pooled model")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 0.477 | 0.056 | 8.487 | 0.000 |
educ | 0.071 | 0.003 | 26.567 | 0.000 |
exper | 0.056 | 0.009 | 6.470 | 0.000 |
I(exper^2) | -0.001 | 0.000 | -3.176 | 0.002 |
tenure | 0.015 | 0.004 | 3.394 | 0.001 |
I(tenure^2) | 0.000 | 0.000 | -1.886 | 0.059 |
black | -0.117 | 0.016 | -7.426 | 0.000 |
south | -0.106 | 0.014 | -7.465 | 0.000 |
union | 0.132 | 0.015 | 8.839 | 0.000 |
The plm()
function accepts the following main arguments, where the parameters shown as vectors c(...)
, such as effect
and model
can only take one value at a time out of the provided list.
plm(formula, data, subset, na.action, effect = c("individual", "time", "twoways"), model = c("within", "random", "ht", "between", "pooling", "fd"),...)
tbl <- tidy(coeftest(wage.pooled, vcov=vcovHC(wage.pooled,
type="HC0",cluster="group")))
kable(tbl, digits=5, caption=
"Pooled 'wage' model with cluster robust standard errors")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 0.47660 | 0.08441 | 5.64629 | 0.00000 |
educ | 0.07145 | 0.00549 | 13.01550 | 0.00000 |
exper | 0.05569 | 0.01129 | 4.93242 | 0.00000 |
I(exper^2) | -0.00115 | 0.00049 | -2.33440 | 0.01963 |
tenure | 0.01496 | 0.00711 | 2.10401 | 0.03545 |
I(tenure^2) | -0.00049 | 0.00041 | -1.18697 | 0.23532 |
black | -0.11671 | 0.02808 | -4.15602 | 0.00003 |
south | -0.10600 | 0.02701 | -3.92422 | 0.00009 |
union | 0.13224 | 0.02703 | 4.89327 | 0.00000 |
15.3 The Fixed Effects Model
The fixed effects model takes into account individual differences, translated into different intercepts of the regression line for different individuals. The model in this case assigns the subscript \(i\) to the constant term \(\beta_{1}\), as shown in Equation \ref{eq:fixedeffeq15}; the constant terms calculated in this way are called fixed effects.
\[\begin{equation} y_{it}=\beta_{1i}+\beta_{2i}x_{2it}+\beta_{3i}x_{3it}+e_{it} \label{eq:fixedeffeq15} \end{equation}\]Variables that change little or not at all over time, such as some individual characteristics should not be included in a fixed effects model because they produce collinearity with the fixed effects.
nls10 <- pdata.frame(nls_panel[nls_panel$id %in% 1:10,])
## series nev_mar, not_smsa, south are constants and have been removed
wage.fixed <- lm(lwage~exper+I(exper^2)+
tenure+I(tenure^2)+union+factor(id)-1,
data=nls10)
kable(tidy(wage.fixed), digits=3,
caption="Fixed effects in a subsample")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
exper | 0.238 | 0.188 | 1.268 | 0.213 |
I(exper^2) | -0.008 | 0.008 | -1.036 | 0.307 |
tenure | -0.012 | 0.034 | -0.362 | 0.720 |
I(tenure^2) | 0.002 | 0.003 | 0.854 | 0.399 |
union | 0.114 | 0.151 | 0.753 | 0.457 |
factor(id)1 | 0.152 | 1.097 | 0.139 | 0.891 |
factor(id)2 | 0.187 | 1.071 | 0.174 | 0.863 |
factor(id)3 | -0.063 | 1.351 | -0.047 | 0.963 |
factor(id)4 | 0.186 | 1.343 | 0.138 | 0.891 |
factor(id)5 | 0.939 | 1.098 | 0.855 | 0.398 |
factor(id)6 | 0.794 | 1.112 | 0.715 | 0.480 |
factor(id)7 | 0.581 | 1.236 | 0.470 | 0.641 |
factor(id)8 | 0.538 | 1.097 | 0.490 | 0.627 |
factor(id)9 | 0.418 | 1.084 | 0.386 | 0.702 |
factor(id)10 | 0.615 | 1.090 | 0.564 | 0.577 |
Table 15.4 displays the results of an OLS regression on a subsample of the first 10 individuals in the dataset \(nls\_panel\). The table is generated by the previous code sequence, where the novelty is using the factor variable \(id\). The function factor()
generates dummy variables for all categories of the variable, taking the first category as the reference. To include the reference in the output, one needs to exclude the constant from the regression model by including the term \(-1\) in the regression formula. When the constant is not excluded, the coefficients of the dummy variables represent, as usual, the difference between the respective category and the benchmark one.
However, to estimate a fixed effects in \(R\) we do not need to create the dummy variables, but use the option model="within"
in the plm()
function. The following code fragment uses the whole sample.
wage.within <- plm(lwage~exper+I(exper^2)+
tenure+I(tenure^2)+south+union,
data=nlspd,
model="within")
tbl <- tidy(wage.within)
kable(tbl, digits=5, caption=
"Fixed effects using 'within' with full sample")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
exper | 0.04108 | 0.00662 | 6.20590 | 0.00000 |
I(exper^2) | -0.00041 | 0.00027 | -1.49653 | 0.13463 |
tenure | 0.01391 | 0.00328 | 4.24333 | 0.00002 |
I(tenure^2) | -0.00090 | 0.00021 | -4.35357 | 0.00001 |
south | -0.01632 | 0.03615 | -0.45153 | 0.65164 |
union | 0.06370 | 0.01425 | 4.46879 | 0.00001 |
wage10.within <- plm(lwage~exper+I(exper^2)+
tenure+I(tenure^2)+union,
data=nls10,
model="within")
tbl <- tidy(wage10.within)
kable(tbl, digits=5, caption=
"Fixed effects using the 'within' model option for n=10")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
exper | 0.23800 | 0.18776 | 1.26758 | 0.21332 |
I(exper^2) | -0.00819 | 0.00790 | -1.03584 | 0.30738 |
tenure | -0.01235 | 0.03414 | -0.36171 | 0.71974 |
I(tenure^2) | 0.00230 | 0.00269 | 0.85407 | 0.39887 |
union | 0.11354 | 0.15086 | 0.75263 | 0.45670 |
Table 15.6 presents the fixed effects model results for the subsample of \(10\) individuals of the dataset \(nls\_panel\). This is to be compared to Table 15.4 to see that the within
method is equiivalent to including the dummies in the model. An interesting comparison is between the pooled and fixed effect models. Comparing Table 15.2 with Table 15.5 one can notice that including accounting for individual heterogeneity significantly lowers the marginal effects of the variables.
Testing if fixed effets are necessary is to compare the fixed effects model wage.within
with the pooled model wage.pooled
. The function pFtest()
does this comparison, as in the following code lines.
kable(tidy(pFtest(wage.within, wage.pooled)), caption=
"Fixed effects test: Ho:'No fixed effects'")
df1 | df2 | statistic | p.value | method | alternative |
---|---|---|---|---|---|
713 | 2858 | 15.1875 | 0 | F test for individual effects | significant effects |
Table 15.7 shows that the null hypothesis of no fixed effects is rejected.
15.4 The Random Effects Model
The random effects model elaborates on the fixed effects model by recognizing that, since the individuals in the panel are randomly selected, their characteristics, measured by the intercept \(\beta_{1i}\) should also be random. Thus, the random effects model assumes the form of the intercept as given in Equation \ref{eq:randeffectsintercept}, where \(\overline \beta_{1}\) stands for the population average and \(u_{i}\) represents an individual-specific random term. As in the case of fixed effects, random effects are also time-invariant.
\[\begin{equation} \beta_{1i} =\overline \beta_{1}+u_{i} \label{eq:randeffectsintercept} \end{equation}\]If this form of the intercept is replaced in Equation \ref{eq:fixedeffeq15}, the result looks like Equation \ref{eq:randeffdef15}.
\[\begin{equation} y_{it}=\overline \beta_{1}+\beta_{2}x_{2it}+\nu_{it} \label{eq:randeffdef15} \end{equation}\]The intercept is here, unlike the fixed effects model constant across individuals, but the error termm, \(\nu_{it}\), incorporates both individual specifics and the initial regression error term, as Equation \ref{eq:nuit15} shows.
\[\begin{equation} \nu_{it}=u_{i}+e_{it} \label{eq:nuit15} \end{equation}\]Thus, the random effects model is distinguished by the special structure of its error term: errors have zero mean, a variance equal to \(\sigma_{u}^{2} +\sigma_{e}^{2}\), uncorrelated across individuals, and having timewise covariance equal to \(\sigma_{u}^{2}\).
An important feature of the random effects model is that the timewise correlation in the errors does not decline over time (see Equation \ref{eq:randeffcorr15}).
\[\begin{equation} \rho=corr(\nu_{it},\nu_{is})=\frac{\sigma_{u}^{2}}{\sigma_{u}^{2}+\sigma_{e}^{2}} \label{eq:randeffcorr15} \end{equation}\]Testing for random effects amounts to testing the hypothesis that there are no differences among individuals, which implies that the individual-specific random variable has zero variance. Equation \ref{eq:randefftest15} shows the hypothesis to be tested.
\[\begin{equation} H_{0}:\;\sigma_{u}^{2}=0,\;\;\;H_{A}:\sigma_{u}^{2}>0 \label{eq:randefftest15} \end{equation}\]The same function we used for fixed effects can be used for random effects, but setting the argument model=
to ‘random’ and selecting the random.method
as one out of four possibilities: “swar” (default), “amemiya”, “walhus”, or “nerlove”. The random effects test function is plmtest()
, which takes as its main argument the pooling model (indeed it extracts the residuals from the pooling object).
wageReTest <- plmtest(wage.pooled, effect="individual")
kable(tidy(wageReTest), caption=
"A random effects test for the wage equation")
statistic | p.value | method | alternative |
---|---|---|---|
62.1231 | 0 | Lagrange Multiplier Test - (Honda) | significant effects |
Table 15.8 shows that the null hypothesis of zero variance in individual-specific errors is rejected; therefore, heterogeneity among individuals may be significant.
Random effects estimator are reliable under the assumption that individual characteristics (heterogeneity) are exogenous, that is, they are independent with respect to the regressors in the random effects equation. The same Hausman test for endogeneity we have already used in another chapter can be used here as well, with the null hypothesis that individual random effects are exogenous. The test function phtest()
compares the fixed effects and the random effects models; the next code lines estimate the random effects model and performs the Hausman endogeneity test.
wage.random <- plm(lwage~educ+exper+I(exper^2)+
tenure+I(tenure^2)+black+south+union,
data=nlspd, random.method="swar",
model="random")
kable(tidy(wage.random), digits=4, caption=
"The random effects results for the wage equation")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 0.5339 | 0.0799 | 6.6854 | 0.0000 |
educ | 0.0733 | 0.0053 | 13.7454 | 0.0000 |
exper | 0.0436 | 0.0064 | 6.8606 | 0.0000 |
I(exper^2) | -0.0006 | 0.0003 | -2.1363 | 0.0327 |
tenure | 0.0142 | 0.0032 | 4.4697 | 0.0000 |
I(tenure^2) | -0.0008 | 0.0002 | -3.8785 | 0.0001 |
black | -0.1167 | 0.0302 | -3.8652 | 0.0001 |
south | -0.0818 | 0.0224 | -3.6518 | 0.0003 |
union | 0.0802 | 0.0132 | 6.0729 | 0.0000 |
kable(tidy(phtest(wage.within, wage.random)), caption=
"Hausman endogeneity test for the random effects wage model")
statistic | p.value | parameter | method | alternative |
---|---|---|---|---|
20.745 | 0.002038 | 6 | Hausman Test | one model is inconsistent |
Table 15.10 shows a low \(p\)-value of the test, which indicates that the null hypothesis saying that the individual random effects are exogenous is rejected, which makes the random effects equation inconsistent. In this case the fixed effects model is the correct solution. (The number of parameters in Table 15.10 is given for the time-varying variables only.)
The fixed effects model, however, does not allow time-invariant variables such as \(educ\) or \(black\). Since the problem of the random effects model is endogeneity, one can use instrumental variables methods when time-invariant regressors must be in the model. The Hausman-Taylor estimator uses instrumental variables in a random effects model; it assumes four categories of regressors: time-varying exogenous, time-varying endogenous, time-invariant exogenous, and time-invariant endogenous. The number of time-varying variables must be at least equal to the number of time-invariant ones. In our \(wage\) model, suppose \(exper\), \(tenure\) and \(union\) are time-varying exogenous, \(south\) is time-varying endogenous, \(black\) is time-invariant exogenous, and \(educ\) is time-invariant endogenous. The same plm()
function allows carrying out Hausman-Taylor estimation by setting model=
“ht”.
wage.HT <- plm(lwage~educ+exper+I(exper^2)+
tenure+I(tenure^2)+black+south+union |
exper+I(exper^2)+tenure+I(tenure^2)+union+black,
data=nlspd, model="ht")
kable(tidy(wage.HT), digits=5, caption=
"Hausman-Taylor estimates for the wage equation")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.75077 | 0.58624 | -1.28066 | 0.20031 |
educ | 0.17051 | 0.04446 | 3.83485 | 0.00013 |
exper | 0.03991 | 0.00647 | 6.16382 | 0.00000 |
I(exper^2) | -0.00039 | 0.00027 | -1.46222 | 0.14368 |
tenure | 0.01433 | 0.00316 | 4.53388 | 0.00001 |
I(tenure^2) | -0.00085 | 0.00020 | -4.31885 | 0.00002 |
black | -0.03591 | 0.06007 | -0.59788 | 0.54992 |
south | -0.03171 | 0.03485 | -0.91003 | 0.36281 |
union | 0.07197 | 0.01345 | 5.34910 | 0.00000 |
Table 15.11 shows the results of the Hausman-Taylor estimation, with the largest changes taking place for \(educ\) and \(black\).
15.5 Grunfeld’s Investment Example
The dataset \(grunfeld2\) is a subset of the initial dataset; it includes two firms, GE and WE observed over the period 1935 to 1954. The purpose of this example is to identify various issues that should be taken into account when building a panel data econometric model. The problem is to find the determinants of investment by a firm , \(inv_{it}\) among regressors such as the value of the firm, \(v_{it}\), and capital stock \(k_{it}\). Table 15.12 gives a glimpse of the grunfeld panel data.
data("grunfeld2", package="PoEdata")
grun <- pdata.frame(grunfeld2, index=c("firm","year"))
kable(head(grun), align="c", caption=
"The head of the grunfeld2 dataset organized as a panel")
inv | v | k | firm | year | |
---|---|---|---|---|---|
1-1935 | 33.1 | 1170.6 | 97.8 | 1 | 1935 |
1-1936 | 45.0 | 2015.8 | 104.4 | 1 | 1936 |
1-1937 | 77.2 | 2803.3 | 118.0 | 1 | 1937 |
1-1938 | 44.6 | 2039.7 | 156.2 | 1 | 1938 |
1-1939 | 48.1 | 2256.2 | 172.6 | 1 | 1939 |
1-1940 | 74.4 | 2132.2 | 186.6 | 1 | 1940 |
Let us consider a pooling model first, assuming that the coefficients of the regression equation, as well as the error variances are the same for both firms (no individual heterogeneity).
grun.pool <- plm(inv~v+k,
model="pooling",data=grun)
kable(tidy(grun.pool), digits=5, caption=
"Grunfeld dataset, pooling panel data results")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 17.87200 | 7.02408 | 2.54439 | 0.01525 |
v | 0.01519 | 0.00620 | 2.45191 | 0.01905 |
k | 0.14358 | 0.01860 | 7.71890 | 0.00000 |
SSE.pool <- sum(resid(grun.pool)^2)
sigma2.pool <- SSE.pool/(grun.pool$df.residual)
For the pooling model, \(SSE=16563.003385\), and \(\sigma ^2 =447.64874\).
Allowing for different coefficients across firms but same error structure is the fixed effects model summarized in Table 15.14. Note that the fixed effects are modeled using the function factor()
.
grun.fe <- plm(inv~v*grun$firm+k*grun$firm,
model="pooling",data=grun)
kable(tidy(grun.fe), digits=4, caption=
"Grunfeld dataset, 'pooling' panel data results")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -9.9563 | 23.6264 | -0.4214 | 0.6761 |
v | 0.0266 | 0.0117 | 2.2651 | 0.0300 |
grun$firm2 | 9.4469 | 28.8054 | 0.3280 | 0.7450 |
k | 0.1517 | 0.0194 | 7.8369 | 0.0000 |
v:grun$firm2 | 0.0263 | 0.0344 | 0.7668 | 0.4485 |
grun$firm2:k | -0.0593 | 0.1169 | -0.5070 | 0.6155 |
SSE.fe <- sum(resid(grun.fe)^2)
sigma2.fe <- SSE.fe/(grun.fe$df.residual)
For the fixed effects model with firm dummies, \(SSE=14989.821701\), and \(\sigma ^2 =440.877109\).
A test to see if the coefficients are significantly different between the pooling and fixed effects equations can be done in \(R\) using the function pooltest
from package plm
; to perform this test, the fixed effects model should be estimated with the function pvcm
with the argument model=
“within”, as the next code lines show.
grun.pvcm <- pvcm(inv~v+k,
model="within", data=grun)
coef(grun.pvcm)
(Intercept) | v | k |
---|---|---|
-9.95631 | 0.026551 | 0.151694 |
-0.50939 | 0.052894 | 0.092406 |
pooltest(grun.pool, grun.pvcm)
##
## F statistic
##
## data: inv ~ v + k
## F = 1.189, df1 = 3, df2 = 34, p-value = 0.328
## alternative hypothesis: unstability
The result shows that the null hypothesis of zero coefficients for the individual dummy terms are zero cannot be rejected. (However, the pvcm
function is not equivalent to the fixed effects model that uses individual dummies; it is, though, useful for testing the ‘poolability’ of a dataset.)
Now, if we allow for different coefficients and different error variances, the equations for each individual is independent from those for other individuals and it can be estimated separately.
grun1.pool <- plm(inv~v+k, model="pooling",
subset=grun$firm==1, data=grun)
SSE.pool1<- sum(resid(grun1.pool)^2)
sig2.pool1 <- SSE.pool1/grun1.pool$df.residual
kable(tidy(grun1.pool), digits=4, align='c', caption=
"Pooling astimates for the GE firm (firm=1)")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -9.9563 | 31.3742 | -0.3173 | 0.7548 |
v | 0.0266 | 0.0156 | 1.7057 | 0.1063 |
k | 0.1517 | 0.0257 | 5.9015 | 0.0000 |
grun2.pool <- plm(inv~v+k, model="pooling",
subset=grun$firm==2, data=grun)
SSE.pool2 <- sum(resid(grun2.pool)^2)
sig2.pool2 <- SSE.pool2/grun2.pool$df.residual
kable(tidy(grun2.pool), digits=4, align='c', caption=
"Pooling estimates for the WE firm (firm=2)")
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.5094 | 8.0153 | -0.0636 | 0.9501 |
v | 0.0529 | 0.0157 | 3.3677 | 0.0037 |
k | 0.0924 | 0.0561 | 1.6472 | 0.1179 |
Tables 15.15 and 15.16 show the results for the equations on subsets of data, separated by firms.
A Godfeld-Quandt test can be carried out to determine whether the variances are different among firms, as the next code shows.
gqtest(grun.pool, point=0.5, alternative="two.sided",
order.by=grun$firm)
##
## Goldfeld-Quandt test
##
## data: grun.pool
## GQ = 0.1342, df1 = 17, df2 = 17, p-value = 0.000143
The result is rejection of the null hypothesis that the variances are equal, indicating that estimating separate equations for each firm is the correct model.
What happens when we assume that the only link between the two firms is correlation between their contemporaneous error terms? This is the model of seemingly unrelated regressions, a generalized least squares method.
library("systemfit")
grunf<- grunfeld2
grunf$Firm<-"WE"
for (i in 1:40){
if(grunf$firm[i]==1){grunf$Firm[i] <- "GE"}
}
grunf$firm <- NULL
names(grunf)<- c("inv", "val", "cap", "year", "firm")
grunfpd <- plm.data(grunf, c("firm","year"))
grunf.SUR <- systemfit(inv~val+cap, method="SUR", data=grunfpd)
summary(grunf.SUR, resdCov=FALSE, equations=FALSE)
##
## systemfit results
## method: SUR
##
## N DF SSR detRCov OLS-R2 McElroy-R2
## system 40 34 15590 35641 0.69897 0.6151
##
## N DF SSR MSE RMSE R2 Adj R2
## GE 20 17 13788.4 811.08 28.479 0.69256 0.65639
## WE 20 17 1801.3 105.96 10.294 0.74040 0.70986
##
## The covariance matrix of the residuals used for estimation
## GE WE
## GE 777.45 207.59
## WE 207.59 104.31
##
## The covariance matrix of the residuals
## GE WE
## GE 811.08 224.28
## WE 224.28 105.96
##
## The correlations of the residuals
## GE WE
## GE 1.00000 0.76504
## WE 0.76504 1.00000
##
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## GE_(Intercept) -27.719317 29.321219 -0.9454 0.357716
## GE_val 0.038310 0.014415 2.6576 0.016575 *
## GE_cap 0.139036 0.024986 5.5647 0.00003423 ***
## WE_(Intercept) -1.251988 7.545217 -0.1659 0.870168
## WE_val 0.057630 0.014546 3.9618 0.001007 **
## WE_cap 0.063978 0.053041 1.2062 0.244256
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
First, please note that the systemfit()
function requires a panel data file created with plm.data
, instead of the pdata.frame
that we have used above; second, for some reason I had to change the names of the variables to names having more than one letter to make the function work. I did this using the function names()
.
References
Croissant, Yves, and Giovanni Millo. 2015. Plm: Linear Models for Panel Data. https://CRAN.R-project.org/package=plm.