Appendix
In the appendix I describe the methods and the data to obtain the parameter \(\theta\). Second, I include a data appendix with relevant summary statistics.
Estimation
In the following section I outline the methods, the additional data sources and the data manipulations I used to estimate \(\theta\). In particular, I motivate the IV estimation and describe the data imputation. Finally, I discuss the results.I follow the approach of CDK to obtain an estimate of \(\theta\). Thus, I estimate the following equation. \[ ln x_{i,j}^k=\delta_{i,j}+\delta_{j}^k + \theta \ln z_{i,j}^k+\epsilon_{i,j}^k\] In the equation \(x_{i,j}^k\) denotes the trade flows between exporting country \(i\), importing country \(j\) in industry \(k\), \(\delta_{i,j}\) denotes an importer-exporter fixed effect, \(\delta_{j}^k\) denotes an importer-sector fixed effect, \(z_{i,j}^k\) denotes the productive efficiency and \(\epsilon_{i,j}^k\) denotes the error term. As in CDK I specify \(z_{i,j}^k\) as the inverse of producer prices.
The structural parameter \(\theta\) may be estimated with OLS under the condition that the econometric error term is exogenous. In the model the error term may be interpreted as a variable trade cost. Thus, the exogeneity condition requires for the three specifications that the inverse of total production cost, total domestic cost and total domestic factor cost is not correlated with variable trade cost.
Costinot, Donaldson, and Komunjer (2012) state two reasons why the condition may be violated. First, the condition may be violated because of a simultaneity bias. An example for a simultaneity bias is agglomeration effects.26 The sign of the simultaneity bias is apriori ambiguous (Costinot, Donaldson, and Komunjer 2012).
Second, the exogeneity condition may be violated due to a measurement error. A measurement error of the international price data would downward bias the estimate of \(\theta\) under the condition that the measurement error is correlated with the true underlying variable (Greene 2003, 85).
I use an IV estimation strategy to address the two outlined problems. By instrumenting the inverse producer price with the instrument R&D expenditures, I attempt to isolate the variation of the regressor, which is exogenous to the econometric error term. Moreover, if the variation of the producer prices explained by R&D mainly affects our independent variable through productive efficiency, than the IV estimation identifies the effect of Ricardian sources of comparative advantage.
I motivate the choice of R&D expenditures as instrumental variable as follows. First, modelling productive efficiency as a process of R&D is in line with the approach of Costinot, Donaldson, and Komunjer (2012) and Eaton and Kortum (2002). A possible mechanism for R&D expenditures to affect the inverse of producer prices is that an increase in R&D expenditures may lead to innovations, which lead to more cost efficient production technologies. In the model a decrease in the cost of producing a good is directly passed through to the producer price, due to the assumption of perfect competition.
I can test the outlined mechanism, on the basis of the first stage regression of the inverse of producer prices on R&D expenditures. Under the outlined mechanism, I expect that the coefficient is positive and statistically significant. An empirical test of the exclusion assumption, which is that the instrument is exogenous to the econometric error term, is however not possible (Cameron and Trivedi 2005, 109).
Data
I use the following additional data sources to estimate \(\theta\). I use international producer price data for the year 2005 from the GGDC (Inklaar and Timmer 2014), R&D expenditures for the year 2005 from the (OECD 2013) ANBRED database and value-added trade and gross export data from the TiVA database. Further, to harmonize the level of aggregation of the international price data with the other data sources, I used value-added output data from the OECD STAN OECD (2013).
I combined the additional data sources with the value-added and gross export data from the TiVA using the ISIC Rev.3.1 two digits classification. In the cases in which the international price data is more disaggregated, I used a weighted average. Specifically, to merge two sectors I assigned to each sector a weight equal to the share of the sectors value-added output relative to the sum of value-added output of the two sectors. %Hence, I aggregated several prices from the service sectors using a weighted average.
Missing data imputation
Schafer and Olsen (1998) note that the following three concerns arise due to missing data: (1) efficiency losses, (2) complications in data handling and data analysis, (3) bias due to differences between the observed and unobserved data. For the estimation of \(\theta\) potential problems may arise due missing data because data on R & D expenditures is not available for some industries.. In particular, the missing data may cause a loss of efficiency in the first stage of the IV estimation, which would reduce the strength of the first stage association between R&D and the inverse of producer prices and hence upward bias the estimates of \(\theta\).Multiple imputation is a Bayesian technique to impute missing data by simulated draws from the posterior predictive distribution.27
In the following, I outline multiple imputation based on (R. J. A. Little and Rubin 2002, 209–11).
Missing data techniques assume that the missing observations of a variable are random variables with a statistical distribution. Multiple imputation as other missing data imputation techniques assume that the probability of a missing observation depends only on the observed data and not on the missing data.
The idea of multiple imputation is to relate the observed posterior distribution to the complete-data posterior distribution, which would be observed in the absence of missing data. The main result of (Rubin 1987) is that the posterior distribution of a statistical quantity may be simulated by first imputing the missing observations with repeated draws from the predictive posterior distribution of the missing data given the observed data and then drawing the statistical quantity from its complete data posterior distribution.28
Multiple imputation produces valid interference from a frequentist perspective (R. J. A. Little and Rubin 2002, 90).
The choice of multiple imputation is based on the following reasons. First, techniques ignoring the missing observations such as complete case analysis or case-wise deletion, require a stronger assumption about the missing data. Specifically, they require that the missing data is a random subset of all observations Bhaskaran and Smeeth (2014)}. Multiple imputation offers a simple and general approach and it correctly accounts for the uncertainty induced by missing observations (Schafer and Olsen 1998).
After the imputation complete-data methods can be used on the imputed data-sets and the results may be combined using Rubin’s rules (Rubin 1987). Specifically, I obtain an estimate of a statistical quantity \(\bar{Q}_m\), by taking the mean of the estimates obtained within each imputed data set \(\bar{Q}_m=1/m \sum_{l=1}^m Q_{l}\), where \(Q_1 \dots Q_m\) denotes the estimate obtained within each data set. The associated variance is the combination of the variance estimates within each imputed data set and the variance between the imputations. Formally, $ T_m={V_m}+ ({m+1}/m)B_m , B_m=1/{m-1}{l=1}^m (Q_l-Q_m)^2 , ,{V_m}=1/m * {l=1}^m V_l $.
To overcome possible shortcomings of multiple imputation, I combine it with predictive mean matching (PMM). PMM is a nearest neighbour matching technique. Its use in the context of multiple imputation is attributed to Rubin (1986) and R. J. A. Little (1986). Unlike multiple imputation, which is based on a normality assumption, PMM imputes missing data with random draws from the closest observations in the observed data. As a consequence, it is well suited to impute skewed variables (White, Royston, and Wood 2011). This approach is relevant to our imputation as R&D expenditure is highly skewed.
Under PMM the missing observation \(y_i\) of unit \(i\) is imputed using a random draw from the observations \(y_j\) of those units \(j\), which have the smallest distance between its predicted value \(\hat{y_j}\) to the predicted value for unit \(\hat{y_i}\) based on a regression of \(\boldsymbol{Y}\) on some covariates \(\boldsymbol{X}\). In particular, I impute each missing value with a random draw from the ten closest observations. This choice rests on the recommendations of the simulation study by (Morris, White, and Royston 2014).
The imputation is conducted as follows. First, I impute the outcome variable of the first stage regression, the log of the inverse of the international producer prices, using country and sector fixed effects. Second, I impute the log of R&D expenditure using country and sector fixed effects and the log of the inverse of the international producer prices. I impute both variables using the country and sector fixed effects, to account for time-invariant determinants of both at the country and sector level. Similarly, Costinot, Donaldson, and Komunjer (2012)} imputed the log of R&D expenditures with the predicted values from a regression on country and sector fixed effects.
I impute the outcome variable based on the following arguments. First, R. J. Little (1992)} argued that if both the regressors and the outcome variable have missing values, the latter may provide additional information to impute the regressors. Second, the simulation study of Moons et al. (2006) found that the results of multiple imputation of covariates with missing observations were biased if the outcome variable was not used in the imputation.
Table 1 presents the cross-sectional results of the estimates of \(\theta\) for the year 2005. Table 1 is divided into three subtables for each of the three dependent variables, gross exports, backward value-added and forward value-added. Across tables, I present the OLS estimates in column 1. I present the IV estimates with the instrument “R&D expenditure” in columns 2-4. In columns 2-3, I present the IV estimates using the complete sector coverage and a restricted sample without primary sectors. In column 5, I present the IV estimates including only high income countries based on the World Bank classification for 2005.The OLS estimates for gross exports and backward value-added trade show a statistically significant positive coefficient. The IV estimates in the columns 2-4 are significantly increased relative to column 1. The estimated \(\theta\) in column 2-4 is between 12.63 and 14.68. The significant increase of the IV estimate compared to the OLS estimate, confirms that the regressor is endogenous (Hausman 1978).
The IV estimates of \(\theta\) across the samples for the dependent variables “gross exports” and “backward value-added trade” show the following results. Overall, the estimates of \(\theta\) for both dependent variables are very close and the difference is statistically not significant. Second, the sample including only high-income countries and excluding the primary sectors shows a statistically significant increase of \(\theta\)..29 A higher estimate of \(\theta\) implies a decreased dispersion of production costs within sectors. The result is in line with our expectations, since the sample includes only high income countries.
The estimates of \(\theta\) on the basis of FVAT are not significant based on OLS. As for the other two dependent variables the IV estimates for \(\theta\) are significantly increased. Third, the estimate of \(\theta\) for the sample including only high-income countries shows in contrast to the results for EXGR and BVAT no significant difference compared to the other samples.
Directly comparing the estimate of \(\theta\) on the basis of gross exports to the result of Costinot, Donaldson, and Komunjer (2012), I find that the IV results in column 2-3 are not statistically different from CDKs results (\(\theta_{CDK}\) 11.1 SE 0.981). However, the authors’ favourite estimate of \(\theta\) is \(6.58\) on the basis of openness corrected exports. The authors use openness corrected gross exports to account for trade selection, which downward biases the productive differences.30
For two reasons I decided to use gross exports and value added trade without correcting for openness. First, data on the import penetration ratio is only available for the manufacturing sectors, which would reduce the sample size considerably. Second, I was unable to obtain a similar correction for VAT.
OLS | Full Sample | Without primary sectors** | Without primary sectors high*** | |
---|---|---|---|---|
Dep. var. | log EXGR 2005 | |||
θ | 0.434 | 12.653 | 11.424 | 14.689 |
SE* | (0.067) | (1.331) | (1.422) | (2.130) |
Exporter Importer FE | Yes | Yes | Yes | Yes |
Importer Sector FE | Yes | Yes | Yes | Yes |
Observations | 18143 | 18143 | 16582 | 14449 |
R-squared | 0.771 | 0.197 | 0.321 | 0.141 |
F-stat 1st stage | 151.41 | 125.6 | 85.24 | |
*HC robust | standard errors | in parentheses | ||
**Without primary | sectors | excludes | the sectors mining & | agriculture |
***high denotes | highly | developed | countries |
OLS | Full Sample | Without primary sectors** | Without primary sectors high*** | |
---|---|---|---|---|
Dep. var. | log BVAX 2005 | |||
θ | 0.476 | 12.911 | 11.762 | 15.080 |
SE* | (0.066) | (1.34) | (1.447) | (2.18) |
Exporter Importer FE | Yes | Yes | Yes | Yes |
Importer Sector FE | Yes | Yes | Yes | Yes |
Observations | 18143 | 18143 | 16582 | 14449 |
R-squared | 0.775 | 0.18 | 0.304 | 0.128 |
F-stat 1st stage | 151.41 | 125.6 | 85.24 | |
———————- | —————– | —————- | ————————— | ———————————– |
*HC robust | standard errors | in parentheses | ||
**Without primary | sectors | excludes | the sectors mining & | agriculture |
***high denotes | highly | developed | countries |
OLS | Full Sample | Without primary sectors** | Without primary sectors high*** | |
---|---|---|---|---|
Dep. var. | log FVAX 2005 | |||
θ | 0.019 | 9.286 | 10.325 | 10.218 |
SE* | (0.045) | (0.868) | (1.291) | (1.199) |
Exporter Importer FE | Yes | Yes | Yes | Yes |
Importer Sector FE | Yes | Yes | Yes | Yes |
Observations | 18143 | 18143 | 16582 | 14449 |
R-squared | 0.882 | 0.475 | 0.431 | 0.488 |
F-stat 1st stage | 151.41 | 125.6 | 85.24 | |
*HC robust | standard errors | in parentheses | ||
**Without primary | sectors | excludes | the sectors mining & | agriculture |
***high denotes | highly | developed | countries |
Results: First stage
The results of the first stage regression address two concerns about the validity of the IV regression: the relevance of the instrument and whether the instrument affects the endogenous regressor in the hypothesized way.
The table shows that the F-statistic of the excluded instrument in the first stage is very high. This implies that the instrument is highly relevant. Further, the first stage shows a statistical significant positive effect of R&D on the inverse of producer prices, which confirms the expected positive effect of R&D.
Full Sample | Without primary sectors | Without primary sectors high | |
---|---|---|---|
Log of R&D | 0.022 | 0.023 | 0.02 |
SE | 0.002 | 0.002 | 0.002 |
Exporter Importer FE | Yes | Yes | Yes |
Export Sector FE | Yes | Yes | Yes |
Observations | 19343 | 17661 | 15283 |
F (excl. dummies) | 125.6 | 88.17 | 85.24 |
Imputations | 29 | 29 | 29 |
ISIC and ISO 3 Alpha Classification
ISIC Code | Short | Description |
---|---|---|
01-05 | Agriculture products | Agriculture, hunting, forestry and fishing |
10-14 | Mining products | Mining and quarrying |
15-16 | Food sector | Food products, beverages and tobacco |
17-18 | Textile products | Textile and textile products |
19 | Leather products | Leather and footwear |
17-19 | Textiles & Leather products | Textiles, textile products, leather and footwear |
20 | Wood products | Wood and products of wood and cork |
21-22 | Paper products | Pulp, paper, paper products, printing and publishing |
23 | Fuel products | Coke, refined petroleum products and nuclear fuel |
24 | Chemical products | Chemicals and chemical products |
25 | Plastic products | Rubber and plastics products |
26 | Mineral products | Other non-metallic mineral products |
27-28 | Metals | Basic metals and fabricated metal products |
29 | Machinery | Machinery and equipment, nec |
30-33 | Electrical | Electrical and optical equipment |
34-35 | Transport | Transport equipment |
36-37 | Misc. Manufacturing | Manufacturing nec; recycling |
40-41 | Electricity | Electricity, gas and water supply |
45 | Construction | Construction |
50-52 | Trade | Wholesale and retail trade; repairs |
55 | Gastronomy | Hotels and restaurants |
60-64 | Communication | Transport and storage, post and telecommunication |
65-67 | Finance | Financial intermediation |
70-74 | Real estate | Real estate, renting and business activities |
75-95 | Social | Community, social and personal services |
ISO 3 | Country | COU | Country |
---|---|---|---|
ARG | Argentina | ITA | Italy |
AUS | Australia | JPN | Japan |
AUT | Austria | KOR | Korea |
BEL | Belgium | LTU | Lithuania |
BGR | Bulgaria | LUX | Luxembourg |
BRA | Brazil | LVA | Latvia |
CAN | Canada | MEX | Mexico |
CHE | Switzerland | MYS | Malaysia |
CHL | Chile | NLD | Netherlands |
CHN | China | NOR | Norway |
COL | Colombia | NZL | New Zeeland |
CYP | Cyprus | PHL | Philippiens |
CZE | Czech Republic | POL | Poland |
DEU | Germany | PRT | Portugal |
DNK | Denmark | ROU | Romania |
ESP | Spain | ROW | Rest of the World |
EST | Estonia | RUS | Russian Federation |
FIN | Finland | SGP | Singapore |
FRA | France | SVK | Slovakia |
GBR | United Kingdom | SVN | Slovenia |
GRC | Greece | SWE | Sweden |
HKG | Hong Kong | THA | Thailand |
HRV | Croatia | TUN | Tunisia |
HUN | Hungary | TUR | Turkey |
IDN | India | TWN | Taiwan |
IND | Indonesia | USA | United States of America |
IRL | Ireland | VNM | Vietnam |
ISR | Israel | ZAF | South Africa |
Data Appendix
Mean | Std_Dev | Min | Max | N | |
---|---|---|---|---|---|
Log BVAT | 2.443 | 2.867 | -4.605 | 10.754 | 17453 |
Log EXGR | 2.742 | 2.871 | -4.605 | 11.108 | 17505 |
Log FVAT | 3.000 | 2.351 | -4.605 | 10.739 | 15999 |
Log inv. prod. prices | 0.267 | 0.274 | -0.672 | 1.167 | 18444 |
Log R&D | 17.801 | 2.441 | 10.745 | 24.759 | 17313 |
Log EXGR | Log BVAT | Log FVAT | Log inv. prod. prices | Log R& D | |
---|---|---|---|---|---|
log gross exports | 1 | - | - | - | - |
log backward value-added trade | 0.996 | 1 | - | - | - |
log forward value-added trade | 0.872 | 0.89 | 1 | - | - |
Log inv. prod. prices | -0.092 | -0.1 | -0.211 | 1 | - |
Log R&D | 0.434 | 0.446 | 0.488 | -0.2 | 1 |
References
Costinot, Arnaud, Dave Donaldson, and Ivana Komunjer. 2012. “What Goods Do Countries Trade? A Quantitative Exploration of Ricardo’s Ideas.” The Review of Economic Studies 79 (2): 581–608.
Greene, William H. 2003. Econometric Analysis. Pearson Education India.
Eaton, Jonathan, and Samuel S. 2002. “Technology, Geography, and Trade.” Econometrica 70 (5). Wiley Online Library: 1741–79.
Cameron, A.C., and P.K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.
Inklaar, Robert, and Marcel P. Timmer. 2014. “The Relative Price of Services.” Review of Income and Wealth 60 (4): 727–46. doi:10.1111/roiw.12012.
OECD. 2013. “STAN R and d Expenditures in Industry.”
Schafer, Joseph L, and Maren K Olsen. 1998. “Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst’s Perspective.” Multivariate Behavioral Research 33 (4): 545–71.
Little, Roderick J. A., and Donald B. Rubin. 2002. Statistical Analysis with Missing Data. Second Edition. Wiley Series in Probability and Statistics. John Wiley; Sons, Inc.
Rubin, Donald B. 1987. Multiple Imputation for Nonresponse in Surveys. 99th ed. Wiley.
Bhaskaran, Krishnan, and Liam Smeeth. 2014. “What Is the Difference Between Missing Completely at Random and Missing at Random?” International Journal of Epidemiology 43 (4): 1336–1339.
Rubin, Donald B. 1986. “Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations.” Journal of Business and Economic Statistics 4 (1): 87–94.
Little, Roderick J. A. 1986. “Survey Nonresponse Adjustments for Estimates of Means.” International Statistical Review / Revue Internationale de Statistique 54 (2): 139–57.
White, Ian R., Patrick Royston, and Angela M. Wood. 2011. “Multiple Imputation Using Chained Equations: Issues and Guidance for Practice.” Statistics in Medicine 30 (4): 377–99.
Morris, Tim P., Ian R. White, and Patrick Royston. 2014. “Tuning Multiple Imputation by Predictive Mean Matching and Local Residual Draws.” BMC Medical Research Methodology 14 (1): 1–13.
Little, Roderick JA. 1992. “Regression with Missing X’s: A Review.” Journal of the American Statistical Association 87 (420): 1227–37.
Moons, Karel G M, Rogier A R T Donders, Theo Stijnen, and Frank E Jr Harrell. 2006. “Using the Outcome for Imputation of Missing Predictor Values Was Preferred.” Journal of Clinical Epidemiology 59 (10): 1092–1101.
Hausman, Jerry A. 1978. “Specification Tests in Econometrics.” Econometrica: Journal of the Econometric Society, 1251–71.
A relevant agglomeration effect in our context would be a positive spillover from the decision of one firm to export into a certain market to the decision of second firm to export to this destination (Bernard and Jensen 2004).↩
It was initially proposed in Rubin (1978) for non-response in surveys, and its statistical properties were developed in Rubin (1987).↩
The posterior distribution is in Bayesian interference obtained by dividing the product of the assumed prior distribution and the likelihood by a normalizing constant. The posterior predictive distribution describes the predicted value averaged over the posterior distribution.↩
I performed an significance based on the t-test. The distribution of test statistic is a t-distribution with \(v\) degrees of freedom, where \(v=(m-1)*(1+ ((1+m^{-1})*B/ \bar{V})^{-1})\) and \(\bar{V}\) denotes the average within-imputation variance and \(B\) denotes the between imputation variation of the estimated parameter (Rubin 1987, 77)↩
Trade selection denotes that a country does not produce certain goods for which they receive a low productivity draw and instead import them (Costinot, Donaldson, and Komunjer 2012).↩