First, click on this link. Scroll down past the description until you see a list of files. The dataset is a .zip file, the first one labeled “2. Data” (see the screenshot below). Clicking on the big rectangular button labeled “Download” on the right side of this line will download the file.
SWIID Frequently Asked Questions
How to . . .
The SWIID download contains files pre-formatted for use with the tools for analyzing multiply-imputed data and other data measured with uncertainty in Stata (
swiid8_2.dta) or R (
swiid8_2.rda). This is meant to “set the default” in a way that encourages researchers to take into account the uncertainty in the SWIID estimates. This uncertainty is considerable in many developing countries, and the tools available in both software packages can now handle pretty much any analysis one may desire (for details, read the documents
stata_swiid.pdf in the SWIID download). But this format does not lend itself to graphing. For this purpose, use the
swiid8_2_summary.csv file, which presents the SWIID estimates in mean-plus-standard-error summary format.
import delimited "swiid8_2_summary.csv", clear // Calculate the bounds of the 95% uncertainty intervals gen gini_disp_95ub = gini_disp + 1.96*gini_disp_se gen gini_disp_95lb = gini_disp - 1.96*gini_disp_se // A silly example gen name_length = length(country) gen first_letter = substr(country, 1, 1) keep if year==2010 & first_letter=="S" /*2010 for Senegal, Serbia, . . .*/ // A scatterplot with 95% confidence intervals twoway rspike gini_net_95ub gini_net_95lb name_length, lstyle(ci) || /// scatter gini_net name_length, msize(small) legend(order(2 "SWIID Net-Income Inequality"))
library(tidyverse) # Load the SWIID load("swiid8_2.rda") # Plot SWIID gini_disp estimates for the United States swiid_summary %>% filter(country == "United States") %>% ggplot(aes(x=year, y=gini_disp)) + geom_line() + geom_ribbon(aes(ymin = gini_disp-1.96*gini_disp_se, ymax = gini_disp+1.96*gini_disp_se, linetype=NA), alpha = .25) + scale_x_continuous(breaks=seq(1960, 2015, 5)) + theme_bw() + labs(x = "Year", y = "SWIID Gini Index, Disposable Income", title = "Income Inequality in the United States") # Plot SWIID gini_net estimates for the United States swiid_summary %>% filter(country == "United States") %>% ggplot(aes(x=year, y=gini_net)) + geom_line() + geom_ribbon(aes(ymin = gini_net-1.96*gini_net_se, ymax = gini_net+1.96*gini_net_se, linetype=NA), alpha = .25) + scale_x_continuous(breaks=seq(1960, 2015, 5)) + theme_bw() + labs(x = "Year", y = "SWIID Gini Index, Net Income", title = "Income Inequality in the United States")
Stata 13 introduced a new file format which older versions of Stata can’t open. Fortunately, there’s an easy fix: the
use13 command. In the command window, first type
ssc install use13 to install it. Then you can type
use13 SWIIDv5_0.dta, clear to load the SWIID data.
Update: Version 5.1 is saved in the older file format, so this shouldn’t be a problem anymore.
First, add whatever country codes are needed to the SWIID data: there are routines for both Stata (kountry; type
findit kountry in Stata’s command window to install) and R (countrycode, on CRAN) to generate many commonly used country codes. Then follow the instructions for merging data into the SWIID found in the “Using the SWIID.pdf” file included in the data download.
To use Stata’s time-series operators (
d. , etc.), you must first declare the time and panel variables using
tsset, but the SWIID is already
mi set, so
tsset won’t work. Run the following before merging the rest of your data (called
my_data.dta in this example) into the SWIID:
use SWIIDv5_1.dta, clear egen cc = group(country) mi tsset cc year merge 1:1 country year using "my_data.dta", keep(match master)
Now you’ll be able to use
l. and the rest of Stata’s time-series operators.
I imported the swiid_summary.csv file, and the data for some countries seem to be given in different units; for example, in one country, gini_disp is 313 and gini_mkt is 33. How can I interpret these data?
The issue here is that the swiid_summary.csv file is saved using RFC 4180 csv conventions, that is, with a dot marking the decimal and a comma separating values. If you are in a locale that observes different conventions—many countries use commas to indicate the decimal and semicolons to separate values—your software’s default settings may not read the data correctly. Be sure to specify the decimal marker and value delimiter appropriately when loading the file: all SWIID values are Gini indices and so fall between zero and one hundred.
The SWIID Data
As in the LIS, Germany is West Germany only before 1991 and (united) Germany from that point forward. The SWIID includes no estimates for East Germany; I just haven’t found much to include in the source data. If you know of a source for East German Ginis, email me with the details, and I’ll be happy to get it into the next version.
What about Russia and the rest of the former Soviet Union? Yugoslavia? Czechoslovakia? Pakistan? Sudan? Ethiopia? Why are there estimates for successor states before they actually existed?
For countries that have undergone partition, the SWIID estimates for a given year include all of the origin country’s then-current territories. This means that for dates before partition, all of the once united territory is included: estimates for Sudan before 2011, for example, include present-day South Sudan.
Where Ginis are available from before the breakup for the territory of successor states, these Ginis have been standardized like anything else in the source data (the SWIID routine does not make any effort to incorporate information from the origin state into estimates for the successor state). For example, the estimates for Russia for dates before the breakup of the Soviet Union are for the territories of the USSR that would become Russia after the breakup; there are similar pre-breakup data available for many FSU states, the Czech Republic and Slovakia, and the successor states of the former Yugoslavia. The resulting estimates are available for use when appropriate, that is, to match the conventions of the other data one may be employing.
There are three reasons why estimates change from one version of the SWIID to the next. The first reason is revisions to the Luxembourg Income Study. As the LIS serves as the standard for SWIID estimates, when Ginis from the LIS change, the SWIID estimates change as well.
The second reason for changes is the addition of more source data for the country. Each new version of the SWIID includes new source data both for (a) years that were not previously available, and (b) earlier years from sources I’ve discovered since the last update. Adding more source data for a country, particularly when previously available data was scant, holds the potential that the estimates will change.
The third, least obvious reason for changes is the availability of more raw data for other countries. The SWIID estimates for many countries that a relatively data-poor depend at least in part on information from other countries in their region or, for some developing countries, even on other developing countries in other regions (see Figure 4 and discussion in Solt 2016). This means that, given the scarcity of data on Ruritania, more data—especially more LIS data—for other countries in Ruritania’s region can lead to substantial changes in the estimates for Ruritania.
Although the SWIID includes estimates of disposable income inequality and market income inequality for all of its country-year observations, it does not provide as many estimates of absolute and relative redistribution, even though these could easily be calculated from
gini_mkt. The reason has to do with the source data employed to generate the SWIID estimates: redistribution estimates are provided only in countries for which there is source data available on both the distribution of market income and the distribution of disposable income or consumption. For other countries, the figures provided for market and disposable income inequality each represent the best estimate possible for each concept given the available source data, but both estimates are based on the same observations in the source data, and the difference between them reflects only information derived from other countries (see Solt 2016, 1274-1275 (page 12 in the pre-print)). It would therefore be a mistake to treat this difference as indicative of the redistributive effect of the country’s taxes and transfers.