Chapter 15 Panel Data Models

rm(list=ls()) #Removes all items in Environment!
library(plm) 
library(tseries) # for `adf.test()`
library(dynlm) #for function `dynlm()`
library(vars) # for function `VAR()`
library(nlWaldTest) # for the `nlWaldtest()` function
library(lmtest) #for `coeftest()` and `bptest()`.
library(broom) #for `glance(`) and `tidy()`
library(PoEdata) #for PoE4 datasets
library(car) #for `hccm()` robust standard errors
library(sandwich)
library(knitr) #for `kable()`
library(forecast) 
library(systemfit)
library(AER)
library(xtable)

New package: plm (Croissant and Millo 2015).

Panel data gathers information about several individuals (cross-sectional units) over several periods. The panel is balanced if all units are observed in all periods; if some units are missing in some periods, the panel is unbalanced. Equation $\ref{eq:panelgeneq15}$ gives the form of a pooled panel data model, where the subscript $i=1,...,N$ denotes an individual (cross sectional unit), and $t=1,...,T$ denotes the time period, or longitudinal unit. The total number of observations in the panel is $N\times T$ .

$\begin{equation} y_{it}=\beta_{1}+\beta_{2}x_{2it}+...+\beta_{K}x_{Kit}+e_{it} \label{eq:panelgeneq15} \end{equation}$

15.1 Organizing the Data as a Panel

A wide panel has the cross-sectional dimension ( $N$ ) much larger than the longitudinal dimension ( $T$ ); when the opposite is true, we have a long panel. Normally, the same units are observed in all periods; when this is not the case and each period samples mostly other units, the result is not a proper panel data, but pooled cross-sections model.

This manual uses the panel data package plm(), which also gives the possibility of organizing the data under the form of a panel. Panel datsets can be organized in mainly two forms: the long form has a column for each variable and a row for each individual-period; the wide form has a column for each variable-period and a row for each individual. Most panel data methods require the long form, but many data sources provide one wide-form table for each variable; assembling the data from different sources into a long form data frame is often not a trivial matter.

The next code sequence creates a panel structure for the dataset $nls\_panel$ using the function pdata.frame of the plm package and displays a small part of this dataset. Please note how the selection of the rows and columns to be displayed is done, using the compact operator $\%in\%$ and arrays such as c(1:6, 14:15). Table 15.1 shows this sample.

library(xtable)
data("nls_panel", package="PoEdata")
nlspd <- pdata.frame(nls_panel, index=c("id", "year"))
smpl <- nlspd[nlspd$id %in% c(1,2),c(1:6, 14:15)]
tbl <- xtable(smpl) 
kable(tbl, digits=4, align="c",
      caption="A data sample")

Table 15.1: A data sample
	id	year	lwage	hours	age	educ	union	exper
1-82	1	82	1.8083	38	30	12	1	7.6667
1-83	1	83	1.8634	38	31	12	1	8.5833
1-85	1	85	1.7894	38	33	12	1	10.1795
1-87	1	87	1.8465	40	35	12	1	12.1795
1-88	1	88	1.8564	40	37	12	1	13.6218
2-82	2	82	1.2809	48	36	17	0	7.5769
2-83	2	83	1.5159	43	37	17	0	8.3846
2-85	2	85	1.9302	35	39	17	0	10.3846
2-87	2	87	1.9190	42	41	17	1	12.0385
2-88	2	88	2.2010	42	43	17	1	13.2115

Function pdim() extracts the dimensions of the panel data:

pdim(nlspd)

## Balanced Panel: n=716, T=5, N=3580

15.2 The Pooled Model

A pooled model has the specification in Equation $\ref{eq:panelgeneq15}$ , which does not allow for intercept or slope differences among individuals. Such a model can be estimated in $R$ using the specification pooling in the plm() function, as the following code sequence illustrates.

wage.pooled <- plm(lwage~educ+exper+I(exper^2)+
  tenure+I(tenure^2)+black+south+union, 
  model="pooling", data=nlspd)
kable(tidy(wage.pooled), digits=3, 
           caption="Pooled model")

Table 15.2: Pooled model
term	estimate	std.error	statistic	p.value
(Intercept)	0.477	0.056	8.487	0.000
educ	0.071	0.003	26.567	0.000
exper	0.056	0.009	6.470	0.000
I(exper^2)	-0.001	0.000	-3.176	0.002
tenure	0.015	0.004	3.394	0.001
I(tenure^2)	0.000	0.000	-1.886	0.059
black	-0.117	0.016	-7.426	0.000
south	-0.106	0.014	-7.465	0.000
union	0.132	0.015	8.839	0.000

The plm() function accepts the following main arguments, where the parameters shown as vectors c(...), such as effect and model can only take one value at a time out of the provided list.

plm(formula, data, subset, na.action, effect = c("individual", "time", "twoways"), model = c("within", "random", "ht", "between", "pooling", "fd"),...)

tbl <- tidy(coeftest(wage.pooled, vcov=vcovHC(wage.pooled,
                    type="HC0",cluster="group")))
kable(tbl, digits=5, caption=
"Pooled 'wage' model with cluster robust standard errors")

Table 15.3: Pooled ‘wage’ model with cluster robust standard errors
term	estimate	std.error	statistic	p.value
(Intercept)	0.47660	0.08441	5.64629	0.00000
educ	0.07145	0.00549	13.01550	0.00000
exper	0.05569	0.01129	4.93242	0.00000
I(exper^2)	-0.00115	0.00049	-2.33440	0.01963
tenure	0.01496	0.00711	2.10401	0.03545
I(tenure^2)	-0.00049	0.00041	-1.18697	0.23532
black	-0.11671	0.02808	-4.15602	0.00003
south	-0.10600	0.02701	-3.92422	0.00009
union	0.13224	0.02703	4.89327	0.00000

15.3 The Fixed Effects Model

The fixed effects model takes into account individual differences, translated into different intercepts of the regression line for different individuals. The model in this case assigns the subscript to the constant term , as shown in Equation $\ref{eq:fixedeffeq15}$ ; the constant terms calculated in this way are called fixed effects.

$\begin{equation} y_{it}=\beta_{1i}+\beta_{2i}x_{2it}+\beta_{3i}x_{3it}+e_{it} \label{eq:fixedeffeq15} \end{equation}$

Variables that change little or not at all over time, such as some individual characteristics should not be included in a fixed effects model because they produce collinearity with the fixed effects.

nls10 <- pdata.frame(nls_panel[nls_panel$id %in% 1:10,])

## series nev_mar, not_smsa, south are constants and have been removed

wage.fixed <- lm(lwage~exper+I(exper^2)+
                  tenure+I(tenure^2)+union+factor(id)-1,
                  data=nls10)
kable(tidy(wage.fixed), digits=3, 
      caption="Fixed effects in a subsample")

Table 15.4: Fixed effects in a subsample
term	estimate	std.error	statistic	p.value
exper	0.238	0.188	1.268	0.213
I(exper^2)	-0.008	0.008	-1.036	0.307
tenure	-0.012	0.034	-0.362	0.720
I(tenure^2)	0.002	0.003	0.854	0.399
union	0.114	0.151	0.753	0.457
factor(id)1	0.152	1.097	0.139	0.891
factor(id)2	0.187	1.071	0.174	0.863
factor(id)3	-0.063	1.351	-0.047	0.963
factor(id)4	0.186	1.343	0.138	0.891
factor(id)5	0.939	1.098	0.855	0.398
factor(id)6	0.794	1.112	0.715	0.480
factor(id)7	0.581	1.236	0.470	0.641
factor(id)8	0.538	1.097	0.490	0.627
factor(id)9	0.418	1.084	0.386	0.702
factor(id)10	0.615	1.090	0.564	0.577

Table 15.4 displays the results of an OLS regression on a subsample of the first 10 individuals in the dataset $nls\_panel$ . The table is generated by the previous code sequence, where the novelty is using the factor variable $id$ . The function factor() generates dummy variables for all categories of the variable, taking the first category as the reference. To include the reference in the output, one needs to exclude the constant from the regression model by including the term $-1$ in the regression formula. When the constant is not excluded, the coefficients of the dummy variables represent, as usual, the difference between the respective category and the benchmark one.

However, to estimate a fixed effects in $R$ we do not need to create the dummy variables, but use the option model="within" in the plm() function. The following code fragment uses the whole sample.

wage.within <- plm(lwage~exper+I(exper^2)+
                  tenure+I(tenure^2)+south+union,
                  data=nlspd, 
                  model="within")
tbl <- tidy(wage.within)
kable(tbl, digits=5, caption=
"Fixed effects using 'within' with full sample")

Table 15.5: Fixed effects using ‘within’ with full sample
term	estimate	std.error	statistic	p.value
exper	0.04108	0.00662	6.20590	0.00000
I(exper^2)	-0.00041	0.00027	-1.49653	0.13463
tenure	0.01391	0.00328	4.24333	0.00002
I(tenure^2)	-0.00090	0.00021	-4.35357	0.00001
south	-0.01632	0.03615	-0.45153	0.65164
union	0.06370	0.01425	4.46879	0.00001

wage10.within <- plm(lwage~exper+I(exper^2)+
                  tenure+I(tenure^2)+union,
                  data=nls10, 
                  model="within")
tbl <- tidy(wage10.within)
kable(tbl, digits=5, caption=
  "Fixed effects using the 'within' model option for n=10")

Table 15.6: Fixed effects using the ‘within’ model option for n=10
term	estimate	std.error	statistic	p.value
exper	0.23800	0.18776	1.26758	0.21332
I(exper^2)	-0.00819	0.00790	-1.03584	0.30738
tenure	-0.01235	0.03414	-0.36171	0.71974
I(tenure^2)	0.00230	0.00269	0.85407	0.39887
union	0.11354	0.15086	0.75263	0.45670

Table 15.6 presents the fixed effects model results for the subsample of $10$ individuals of the dataset $nls\_panel$ . This is to be compared to Table 15.4 to see that the within method is equiivalent to including the dummies in the model. An interesting comparison is between the pooled and fixed effect models. Comparing Table 15.2 with Table 15.5 one can notice that including accounting for individual heterogeneity significantly lowers the marginal effects of the variables.

Testing if fixed effets are necessary is to compare the fixed effects model wage.within with the pooled model wage.pooled. The function pFtest() does this comparison, as in the following code lines.

kable(tidy(pFtest(wage.within, wage.pooled)), caption=
        "Fixed effects test: Ho:'No fixed effects'")

Table 15.7: Fixed effects test: Ho:‘No fixed effects’
df1	df2	statistic	p.value	method	alternative
713	2858	15.1875	0	F test for individual effects	significant effects

Table 15.7 shows that the null hypothesis of no fixed effects is rejected.

15.4 The Random Effects Model

The random effects model elaborates on the fixed effects model by recognizing that, since the individuals in the panel are randomly selected, their characteristics, measured by the intercept should also be random. Thus, the random effects model assumes the form of the intercept as given in Equation $\ref{eq:randeffectsintercept}$ , where $\overline \beta_{1}$ stands for the population average and $u_{i}$ represents an individual-specific random term. As in the case of fixed effects, random effects are also time-invariant.

$\begin{equation} \beta_{1i} =\overline \beta_{1}+u_{i} \label{eq:randeffectsintercept} \end{equation}$

If this form of the intercept is replaced in Equation $\ref{eq:fixedeffeq15}$ , the result looks like Equation $\ref{eq:randeffdef15}$ .

$\begin{equation} y_{it}=\overline \beta_{1}+\beta_{2}x_{2it}+\nu_{it} \label{eq:randeffdef15} \end{equation}$

The intercept is here, unlike the fixed effects model constant across individuals, but the error termm, , incorporates both individual specifics and the initial regression error term, as Equation $\ref{eq:nuit15}$ shows.

$\begin{equation} \nu_{it}=u_{i}+e_{it} \label{eq:nuit15} \end{equation}$

Thus, the random effects model is distinguished by the special structure of its error term: errors have zero mean, a variance equal to $\sigma_{u}^{2} +\sigma_{e}^{2}$ , uncorrelated across individuals, and having timewise covariance equal to $\sigma_{u}^{2}$ .

An important feature of the random effects model is that the timewise correlation in the errors does not decline over time (see Equation $\ref{eq:randeffcorr15}$ ).

$\begin{equation} \rho=corr(\nu_{it},\nu_{is})=\frac{\sigma_{u}^{2}}{\sigma_{u}^{2}+\sigma_{e}^{2}} \label{eq:randeffcorr15} \end{equation}$

Testing for random effects amounts to testing the hypothesis that there are no differences among individuals, which implies that the individual-specific random variable has zero variance. Equation $\ref{eq:randefftest15}$ shows the hypothesis to be tested.

$\begin{equation} H_{0}:\;\sigma_{u}^{2}=0,\;\;\;H_{A}:\sigma_{u}^{2}>0 \label{eq:randefftest15} \end{equation}$

The same function we used for fixed effects can be used for random effects, but setting the argument model= to ‘random’ and selecting the random.method as one out of four possibilities: “swar” (default), “amemiya”, “walhus”, or “nerlove”. The random effects test function is plmtest(), which takes as its main argument the pooling model (indeed it extracts the residuals from the pooling object).

wageReTest <- plmtest(wage.pooled, effect="individual")
kable(tidy(wageReTest), caption=
        "A random effects test for the wage equation")

Table 15.8: A random effects test for the wage equation
statistic	p.value	method	alternative
62.1231	0	Lagrange Multiplier Test - (Honda)	significant effects

Table 15.8 shows that the null hypothesis of zero variance in individual-specific errors is rejected; therefore, heterogeneity among individuals may be significant.

Random effects estimator are reliable under the assumption that individual characteristics (heterogeneity) are exogenous, that is, they are independent with respect to the regressors in the random effects equation. The same Hausman test for endogeneity we have already used in another chapter can be used here as well, with the null hypothesis that individual random effects are exogenous. The test function phtest() compares the fixed effects and the random effects models; the next code lines estimate the random effects model and performs the Hausman endogeneity test.

wage.random <- plm(lwage~educ+exper+I(exper^2)+
                  tenure+I(tenure^2)+black+south+union,
                  data=nlspd, random.method="swar",
                  model="random")
kable(tidy(wage.random), digits=4, caption=
      "The random effects results for the wage equation")

Table 15.9: The random effects results for the wage equation
term	estimate	std.error	statistic	p.value
(Intercept)	0.5339	0.0799	6.6854	0.0000
educ	0.0733	0.0053	13.7454	0.0000
exper	0.0436	0.0064	6.8606	0.0000
I(exper^2)	-0.0006	0.0003	-2.1363	0.0327
tenure	0.0142	0.0032	4.4697	0.0000
I(tenure^2)	-0.0008	0.0002	-3.8785	0.0001
black	-0.1167	0.0302	-3.8652	0.0001
south	-0.0818	0.0224	-3.6518	0.0003
union	0.0802	0.0132	6.0729	0.0000

kable(tidy(phtest(wage.within, wage.random)), caption=
 "Hausman endogeneity test for the random effects wage model")

Table 15.10: Hausman endogeneity test for the random effects wage model
statistic	p.value	parameter	method	alternative
20.745	0.002038	6	Hausman Test	one model is inconsistent

Table 15.10 shows a low $p$ -value of the test, which indicates that the null hypothesis saying that the individual random effects are exogenous is rejected, which makes the random effects equation inconsistent. In this case the fixed effects model is the correct solution. (The number of parameters in Table 15.10 is given for the time-varying variables only.)

The fixed effects model, however, does not allow time-invariant variables such as $educ$ or $black$ . Since the problem of the random effects model is endogeneity, one can use instrumental variables methods when time-invariant regressors must be in the model. The Hausman-Taylor estimator uses instrumental variables in a random effects model; it assumes four categories of regressors: time-varying exogenous, time-varying endogenous, time-invariant exogenous, and time-invariant endogenous. The number of time-varying variables must be at least equal to the number of time-invariant ones. In our $wage$ model, suppose $exper$ , $tenure$ and $union$ are time-varying exogenous, $south$ is time-varying endogenous, $black$ is time-invariant exogenous, and $educ$ is time-invariant endogenous. The same plm() function allows carrying out Hausman-Taylor estimation by setting model= “ht”.

wage.HT <- plm(lwage~educ+exper+I(exper^2)+
      tenure+I(tenure^2)+black+south+union |
      exper+I(exper^2)+tenure+I(tenure^2)+union+black,
      data=nlspd, model="ht")
kable(tidy(wage.HT), digits=5, caption=
     "Hausman-Taylor estimates for the wage equation")

Table 15.11: Hausman-Taylor estimates for the wage equation
term	estimate	std.error	statistic	p.value
(Intercept)	-0.75077	0.58624	-1.28066	0.20031
educ	0.17051	0.04446	3.83485	0.00013
exper	0.03991	0.00647	6.16382	0.00000
I(exper^2)	-0.00039	0.00027	-1.46222	0.14368
tenure	0.01433	0.00316	4.53388	0.00001
I(tenure^2)	-0.00085	0.00020	-4.31885	0.00002
black	-0.03591	0.06007	-0.59788	0.54992
south	-0.03171	0.03485	-0.91003	0.36281
union	0.07197	0.01345	5.34910	0.00000

Table 15.11 shows the results of the Hausman-Taylor estimation, with the largest changes taking place for $educ$ and $black$ .

15.5 Grunfeld’s Investment Example

The dataset $grunfeld2$ is a subset of the initial dataset; it includes two firms, GE and WE observed over the period 1935 to 1954. The purpose of this example is to identify various issues that should be taken into account when building a panel data econometric model. The problem is to find the determinants of investment by a firm , $inv_{it}$ among regressors such as the value of the firm, $v_{it}$ , and capital stock $k_{it}$ . Table 15.12 gives a glimpse of the grunfeld panel data.

data("grunfeld2", package="PoEdata")
grun <- pdata.frame(grunfeld2, index=c("firm","year"))
kable(head(grun), align="c", caption=
"The head of the grunfeld2 dataset organized as a panel")

Table 15.12: The head of the grunfeld2 dataset organized as a panel
	inv	v	k	firm	year
1-1935	33.1	1170.6	97.8	1	1935
1-1936	45.0	2015.8	104.4	1	1936
1-1937	77.2	2803.3	118.0	1	1937
1-1938	44.6	2039.7	156.2	1	1938
1-1939	48.1	2256.2	172.6	1	1939
1-1940	74.4	2132.2	186.6	1	1940

Let us consider a pooling model first, assuming that the coefficients of the regression equation, as well as the error variances are the same for both firms (no individual heterogeneity).

grun.pool <- plm(inv~v+k, 
                 model="pooling",data=grun)
kable(tidy(grun.pool), digits=5, caption=
  "Grunfeld dataset, pooling panel data results")

Table 15.13: Grunfeld dataset, pooling panel data results
term	estimate	std.error	statistic	p.value
(Intercept)	17.87200	7.02408	2.54439	0.01525
v	0.01519	0.00620	2.45191	0.01905
k	0.14358	0.01860	7.71890	0.00000

SSE.pool <- sum(resid(grun.pool)^2)
sigma2.pool <- SSE.pool/(grun.pool$df.residual)

For the pooling model, $SSE=16563.003385$ , and $\sigma ^2 =447.64874$ .

Allowing for different coefficients across firms but same error structure is the fixed effects model summarized in Table 15.14. Note that the fixed effects are modeled using the function factor().

grun.fe <- plm(inv~v*grun$firm+k*grun$firm, 
               model="pooling",data=grun)
kable(tidy(grun.fe), digits=4, caption=
  "Grunfeld dataset, 'pooling' panel data results")

Table 15.14: Grunfeld dataset, ‘pooling’ panel data results
term	estimate	std.error	statistic	p.value
(Intercept)	-9.9563	23.6264	-0.4214	0.6761
v	0.0266	0.0117	2.2651	0.0300
grun$firm2	9.4469	28.8054	0.3280	0.7450
k	0.1517	0.0194	7.8369	0.0000
v:grun$firm2	0.0263	0.0344	0.7668	0.4485
grun$firm2:k	-0.0593	0.1169	-0.5070	0.6155

SSE.fe <- sum(resid(grun.fe)^2)
sigma2.fe <- SSE.fe/(grun.fe$df.residual)

For the fixed effects model with firm dummies, $SSE=14989.821701$ , and $\sigma ^2 =440.877109$ .

A test to see if the coefficients are significantly different between the pooling and fixed effects equations can be done in $R$ using the function pooltest from package plm; to perform this test, the fixed effects model should be estimated with the function pvcm with the argument model= “within”, as the next code lines show.

grun.pvcm <- pvcm(inv~v+k, 
                   model="within", data=grun)
coef(grun.pvcm)

(Intercept)	v	k
-9.95631	0.026551	0.151694
-0.50939	0.052894	0.092406

pooltest(grun.pool, grun.pvcm)

## 
##  F statistic
## 
## data:  inv ~ v + k
## F = 1.189, df1 = 3, df2 = 34, p-value = 0.328
## alternative hypothesis: unstability

The result shows that the null hypothesis of zero coefficients for the individual dummy terms are zero cannot be rejected. (However, the pvcm function is not equivalent to the fixed effects model that uses individual dummies; it is, though, useful for testing the ‘poolability’ of a dataset.)

Now, if we allow for different coefficients and different error variances, the equations for each individual is independent from those for other individuals and it can be estimated separately.

grun1.pool <- plm(inv~v+k, model="pooling",
                 subset=grun$firm==1, data=grun)
SSE.pool1<- sum(resid(grun1.pool)^2)
sig2.pool1 <- SSE.pool1/grun1.pool$df.residual
kable(tidy(grun1.pool), digits=4, align='c', caption=
        "Pooling astimates for the GE firm (firm=1)")

Table 15.15: Pooling astimates for the GE firm (firm=1)
term	estimate	std.error	statistic	p.value
(Intercept)	-9.9563	31.3742	-0.3173	0.7548
v	0.0266	0.0156	1.7057	0.1063
k	0.1517	0.0257	5.9015	0.0000

grun2.pool <- plm(inv~v+k, model="pooling",
                 subset=grun$firm==2, data=grun)
SSE.pool2 <- sum(resid(grun2.pool)^2)
sig2.pool2 <- SSE.pool2/grun2.pool$df.residual
kable(tidy(grun2.pool), digits=4, align='c', caption=
        "Pooling estimates for the WE firm (firm=2)")

Table 15.16: Pooling estimates for the WE firm (firm=2)
term	estimate	std.error	statistic	p.value
(Intercept)	-0.5094	8.0153	-0.0636	0.9501
v	0.0529	0.0157	3.3677	0.0037
k	0.0924	0.0561	1.6472	0.1179

Tables 15.15 and 15.16 show the results for the equations on subsets of data, separated by firms.

A Godfeld-Quandt test can be carried out to determine whether the variances are different among firms, as the next code shows.

gqtest(grun.pool, point=0.5, alternative="two.sided",
       order.by=grun$firm)

## 
##  Goldfeld-Quandt test
## 
## data:  grun.pool
## GQ = 0.1342, df1 = 17, df2 = 17, p-value = 0.000143

The result is rejection of the null hypothesis that the variances are equal, indicating that estimating separate equations for each firm is the correct model.

What happens when we assume that the only link between the two firms is correlation between their contemporaneous error terms? This is the model of seemingly unrelated regressions, a generalized least squares method.

library("systemfit")
grunf<- grunfeld2
grunf$Firm<-"WE"
for (i in 1:40){
  if(grunf$firm[i]==1){grunf$Firm[i] <- "GE"}
}
grunf$firm <- NULL
names(grunf)<- c("inv", "val", "cap", "year", "firm")
grunfpd <- plm.data(grunf, c("firm","year"))
grunf.SUR <- systemfit(inv~val+cap, method="SUR", data=grunfpd)
summary(grunf.SUR, resdCov=FALSE, equations=FALSE)

## 
## systemfit results 
## method: SUR 
## 
##         N DF   SSR detRCov  OLS-R2 McElroy-R2
## system 40 34 15590   35641 0.69897     0.6151
## 
##     N DF     SSR    MSE   RMSE      R2  Adj R2
## GE 20 17 13788.4 811.08 28.479 0.69256 0.65639
## WE 20 17  1801.3 105.96 10.294 0.74040 0.70986
## 
## The covariance matrix of the residuals used for estimation
##        GE     WE
## GE 777.45 207.59
## WE 207.59 104.31
## 
## The covariance matrix of the residuals
##        GE     WE
## GE 811.08 224.28
## WE 224.28 105.96
## 
## The correlations of the residuals
##         GE      WE
## GE 1.00000 0.76504
## WE 0.76504 1.00000
## 
## 
## Coefficients:
##                  Estimate Std. Error t value   Pr(>|t|)    
## GE_(Intercept) -27.719317  29.321219 -0.9454   0.357716    
## GE_val           0.038310   0.014415  2.6576   0.016575 *  
## GE_cap           0.139036   0.024986  5.5647 0.00003423 ***
## WE_(Intercept)  -1.251988   7.545217 -0.1659   0.870168    
## WE_val           0.057630   0.014546  3.9618   0.001007 ** 
## WE_cap           0.063978   0.053041  1.2062   0.244256    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

First, please note that the systemfit() function requires a panel data file created with plm.data, instead of the pdata.frame that we have used above; second, for some reason I had to change the names of the variables to names having more than one letter to make the function work. I did this using the function names().

References

Croissant, Yves, and Giovanni Millo. 2015. Plm: Linear Models for Panel Data. https://CRAN.R-project.org/package=plm.