1  Data Resources

Objectives

Objective of this chapter

This short chapter will list possible data resources. At the moment of this writing (August 2024) I do not know what data exactly I am going to use for my research. Therefore I will update this chapter continuously.

1.1 World Inequality Lab (WIL)

The WIL consists of a team of about 40 people, based primarily at the Paris School of Economics and the University of Berkeley, California. It is supported by more than 200 researchers of diverse nationalities and institutions, forming the so-called WID Fellows. (See the complete team working on the WID.)

1.1.1 World Inequality Database (WID)

Generally I will prefer data already collected and cleaned by some trustful organizations or researchers. And from these I will preferable (or at least start) with data free accessible at the WID. These data are compiled by the WIL and presented as open-access datasets accompanied by publications, graphics, tables slides and an interactive website at https://wid.world.

Resource 1.1 : Structure of the WID Resources

Interactive website

  • World View: Compare inequality between country on an interactive world map.
  • Country Graphs: Follow the evolution of inequality within countries with user-friendly graphs.
  • Data Tables: Download the open-access datasets from the [data page](https://wid.world/data/. You can prepare selected data to download or download the complete dataset.

Methodology

  • Distributional National Account (DINA) Guidelines: The Distributional National Account Guidelines (DINA) is a resource to understand the methodology behind the decomposition of National Accounts and the estimation of inequality series for income and wealth. It explains methods and concepts used in the World Inequality Database. The version I am using is from February 2024, has 186 pages, and can be downloaded as PDF.
  • Code Dictionary Page: The Code Dictionary Page describes the general structure of the World Inequality Database (WID). It explains how information is organized within it, and how to retrieve and interpret it content. It has three sections:
    • Chapter 1 explains the different ways of accessing the database.
    • Chapter 2 explains the general structure of the variables.
    • Chapter 3 to 9 describe the different WID codes that exist in the database.
  • Summary Table: The Summary Table summarizes all available WID.world data. It is an efficient starting point: One can use the search tool to look for specific countries, variables or WID.world codes, which you can then download for more detailed analysis.
  • Downloading Data:
    • Download of specific series: The WID data page is designed for ease of use. Simply scroll through the left-hand menu to select the data you require, and download it in the format that best suits your needs.
    • Bulk Download: If you who want to request a large number of series at the same time, there is also a bulk download options. Wit the button at the bottom-left of the data section you can download the entire database at once. You can also download all of the data relative to a specific country on the country pages.
Some tools are not maintained anymore

Thomas Blanchet has developed with gpinter (an R package for generalized Pareto interpolation), wid-R-tool (an R package to download data from the WID.world), and wid-stata-tool (Stata commands to download data from the WID.world database) several tools for working with the WID. But he is not affiliated anymore with the WID and has recently (2024-08-24) withdrawn all support for these tools.

For the R and STATA code he claims grave concerns regarding the validity and integrity of the data these packages accesses. In the case of gpinter he points out:

Experience has shown that the approach the package implements causes more problems than it solves, especially when used by people who do not understand the underlying methodology and treat it as a black box. Older approaches, such as the mean-split histogram (see e.g., Measuring Inequality, Appendix A7, p. 172 (Cowell 2011)), provide nearly identical results at a fraction of the cost and complexity (and to the extent that the results diverge, it almost certainly indicates a problem with the raw data that should rather be addressed directly.)

1.1.2 Other WIL projects

The WID is the flagship project of the WIL and it is also the main resource I am going to use. The WIL is working on several important research projects, most of them have not only publication but also free and open accessible data sets. Many of these projects can supply data for my research and function as advanced organizers for me. I have to inspect their data sets and figure out what data to use for an effective learning approach.

Resource 1.2 : WIL research projects with free available datasets

  • The World Political Cleavages and Inequality Database (WPID) is the result of a collaborative research program involving about twenty researchers all around the world. The central aim is to provide open and convenient access to the most extensive available dataset on the structure of political cleavages and social inequalities in electoral democracies, located on the five continents, from the mid-20th century to the present. (Gethin, Martinez-Toledono, and Piketty 2024).
  • The World Inequality Report 2022 (WIR2022) is the benchmark account of recent and historical trends in inequality. The report builds on the pioneering edition of 2018 to provide policy makers and scholars everywhere up-to-date information about the history of inequality, gender inequality, environmental inequalities, and trends in international tax reform and redistribution.
  • Une histoire du conflit politique is about elections and social inequalities in France, 1789-2022 (Cagé and Piketty 2023) and is written in French. Who votes for whom and why? How did the social structure of the electorates of the different political movements and currents evolve in France from 1789 to 2022? You have access to all the electoral and socio-economic data that were collected at the level of the 36,000 municipalities of France, from the digitized electoral minutes to the National Archives to the finalized and homogenized files on which the research was based.
  • Inequality in Latin America. The article is written in Spanish and it is based on More Unequal or Not as Rich? Revisiting the Latin American Exception. The article examines income distribution in Latin America and challenges the commonly accepted belief that the region is an exception to global trends of growing inequality. To support his research the team built new datasets which covers 80% of the region’s population and combines surveys harmonized, social security data, tax returns on national income and accounts during the period 2000-2020.
  • Global Taxation of Capital and Labor studies how the globalization has affected the taxation of labor and capital incomes. To address this questions the authors built and analyzed a new, global database on effective macroeconomic tax rates in more than 150 countries since 1965, combining national accounts data with government revenue statistics, including from a wide variety of archival records.
  • Realtime Inequality provides the first timely statistics on how economic growth is distributed across groups. When new growth numbers come out each quarter, we show how each income and wealth group benefits to conclude who benefits from income and wealth Growth in the United States.
  • Carbon Inequality Data explores the inequality of carbon footprint across the global population. Lucas Chancel estimates the global inequality of individual greenhouse gas (GHG) emissions between 1990 and 2019 using a newly assembled dataset of income and wealth inequality, environmental input-output tables and a framework differentiating emissions from consumption and investments (Chancel 2022b, 2022a).
  • Inequality Transparency Index reports about the extent of inequality data opacity per country. The index inequality transparency index ranges from 0 to 20 for each country. The index looks at four different data sources: income surveys, income tax data, wealth surveys and wealth tax data and different components for each of these sources: Quality, Frequency of Publication and Access to Data.It explains the calculation in a Technical Note and provides data for each country.
  • Missing Profits investigates how much profits each country loses or attracts because of tax avoidance. There is an interactive map to explore which countries attract and lose profits. It provides main data, raw data, STATA programs, replication guides for tables and graphs and an update with 2016 data.
  • Tax Justice Now is an essential companion to The Triumph of Injustice (Saez and Zucman 2020). You can visualize how much each income group pays in taxes when we include all taxes (income taxes, corporate taxes, payroll taxes, consumption taxes, etc.) at all levels of government (federal, state, and local). Plus you can explore how changing existing taxes—such as increasing individual income tax rates—or creating new taxes—such as a progressive wealth tax or a value added tax—would affect tax revenue, tax progressivity, and inequality. There is a Technical Appendix with all the programs and data for replication.

1.2 Other data resources

The WID will be my my starting point for exploring the inequality in Austria. But whenever sensible I will try to discover and use other resources as well. What follows is a list of possible data resources I have already found. With the exception of OWID and SWIID those data are recommended in WIR2022 as additional data resources for inequality research (Chancel et al. 2022).

Resource 1.3 : A short list of inequality data resources besides the WID (arranged alphabetically)

  • CEQ: The Commitment to Equity Database (CEQ) was founded by Nora Lustig of the Tulane University and is managed by an institute with the same name. The Institute works to reduce inequality and poverty through comprehensive and rigorous tax and benefit incidence analysis, and active engagement with the policy community. Its objective is to measure the impact of fiscal policy on inequality and poverty across the world using a comparable framework (Lustig, CEQ Institute, and Brookings Institution 2022a, 2022b).
  • EU-SILC: The European Survey of Income and Living Conditions (EU-SILC) database is a comprehensive statistical instrument that collects and provides multidimensional microdata on income, poverty, social exclusion, and living conditions in the European Union (EU). The database is designed to support policymakers, researchers, and analysts in monitoring and evaluating social policies, as well as conducting evidence-based research on various aspects of income, poverty, and well-being in Europe.
  • IDD & WDD: OECD Income and wealth distribution databases are two dedicated statistical databases to benchmark and monitor economic inequality across countries: The OECD Income Distribution Database (IDD) offers data on levels and trends in income inequality and poverty and is updated on a rolling basis, two to three times a year. The OECD Wealth Distribution Database (WDD) collects information on the distribution of household net wealth and is updated every two or three years.
  • LIS: The Luxembourg Income Study Database (LIS) is a data archive and research center dedicated to cross-national analysis. It is home to the Luxembourg Income Study Database (LIS) and the Luxembourg Wealth Study Database (LWS) (LIS Author Collective 2024).
  • OWID: Our World In Data (OWID) is an online publication that focuses on large global problems such as poverty, disease, hunger, climate change, war, existential risks, and inequality (Global Change Data Lab and Roser 2024).
  • PIP: The Poverty and Inequality Platform (PIP) — formerly the PovcalNet — is an interactive computational tool that offers users quick access to the World Bank’s estimates of poverty, inequality, and shared prosperity. PIP provides a comprehensive view of global, regional, and country-level trends for more than 160 economies around the world (World Bank 2024).
  • SEDLAC: The Socio-Economic Database for Latin America and the Caribbean (SEDLAC) includes statistics on poverty and other distributional and social variables from 25 Latin American and Caribbean (LAC) countries. All statistics are computed from microdata of the main household surveys carried out in these countries using a homogeneous methodology (data permitting). SEDLAC allows users to monitor the trends in poverty and other distributional and social indicators in the region. The database is available in the form of brief reports, charts and electronic Excel tables with information for each country/year. In addition, the website visitor can carry out dynamic searches online.
  • SWIID: The goal of the Standardized World Income Inequality Database (SWIID) is to meet the needs of those engaged in broadly cross-national research by maximizing the comparability of income inequality data while maintaining the widest possible coverage across countries and over time. The SWIID standardizes the United Nations University database (UNU-WIDER 2008)1. The SWIID’s income inequality estimates are based on thousands of reported Gini indices from hundreds of published sources. The SWIID currently incorporates comparable Gini indices of disposable and market income inequality for 199 countries for as many years as possible from 1960 to the present (version 9.6, December 2023); it also includes information on absolute and relative redistribution. (See also the SWIID repo at GitHub.)
  • UTIP: The University of Texas Income Project Database (UTIP) contains data sets on pay inequality at the global level, at the national level including for Argentina, Brazil, Cuba, China, India, and Russia, and at the regional level for Europe. It is managed by a small research group concerned with measuring and explaining movements of inequality in wages and earnings and patterns of industrial change around the world. Their work has emphasized the use of Theil’s T statistic to compute inequality indexes from industrial, regional and sectoral data.
  • WIID The World Income Inequality Database (WIID) presents information on income inequality for developed, developing, and transition countries. The latest version of the WIID, released 28 November 2023, covers 201 countries (including historical entities) through 2022, with over 24,000 data points in total. There are now nearly 4,000 unique country-year observations in the database. The WIID is managed by UNU WIDER - the United Nations University World Institute for Development Economics Research.

To understand use all these data resources is a complex enterprise. As a first step and starting point I will concentrate on the World Inequality Database (WID) because there exist several monographs where these data are used and interpreted. But most of the time these books refer to global, regional or ‘important’ countries like China, France, German, India, United Kingdom, United States etc. My challenge is trying to replicate the main results with the example of Austria. Unfortunately the dataset for Austria is not very comprehensive. It could be the case that I should try to develop my own dataset for Austria using specific data sources like Statistic Austria or the European Survey on Income and Living Conditions (EU-SILC).

References

Cagé, Julia, and Thomas Piketty. 2023. Une Histoire Du Conflit Politique: Elections Et Inégalités Sociales En France, 1789-2022. Paris XIXe: SEUIL.
Chancel, Lucas. 2022a. “Global Carbon Inequality over 1990-2019.” https://lucaschancel.com/global-carbon-inequality-1990-2019/.
———. 2022b. “Global Carbon Inequality over 19902019.” Nature Sustainability 5 (11): 931–38. https://doi.org/10.1038/s41893-022-00955-z.
Chancel, Lucas, Thomas Piketty, Emmanuel Saez, Gabriel Zucman, and Esther Duflo. 2022. World Inequality Report 2022, World Inequality Lab. Cambridge, Massachusetts London, England: Harvard University Press. https://wir2022.wid.world/.
Cowell, Frank. 2011. Measuring Inequality. 3rd ed. Oxford: Oxford University Press, U.S.A.
Gethin, Amory, Clara Martinez-Toledono, and Thomas Piketty. 2024. “WPID - World Political Cleavages and Inequality Database.” https://wpid.world/.
Global Change Data Lab, and Max Roser. 2024. “OWID Homepage.” Our World in Data, March. https://ourworldindata.org.
Jenkins, Stephen P. 2015. “World Income Inequality Databases: An Assessment of WIID and SWIID.” The Journal of Economic Inequality 13 (4): 629–71. https://doi.org/10.1007/s10888-015-9305-3.
LIS Author Collective. 2024. “LIS Cross-National Data Center in Luxembourg.” https://www.lisdatacenter.org/.
Lustig, Nora, CEQ Institute, and Brookings Institution, eds. 2022a. Commitment to Equity Handbook. Volume 1: Fiscal Incidence Analysis: Methodology, Implementation, and Applications. Second edition. Washington, DC: Brookings Institution.
———, eds. 2022b. Commitment to Equity Handbook. Volume 2: Methodological Frontiers in Fiscal Incidence Analysis. Second edition. Washington, DC: Brookings Institution.
Saez, Emmanuel, and Gabriel Zucman. 2020. The Triumph of Injustice: How the Rich Dodge Taxes and How to Make Them Pay. New York, NY: W W NORTON & CO.
World Bank. 2024. “Poverty and Inequality Platform (Version: 20240326_2017).” https://pip.worldbank.org/home.

  1. But the WIID in is latest version is also already standardized. See WIID. Additionally there are concerns about the data imputation model at least for SWIID version 4.0 (Jenkins 2015).↩︎