SWIID Frequently Asked Questions

How to . . .

Wait, how can I download the data again?

First, click on this link. Scroll down past the description until you see a list of files. The dataset is the SWIIDv5_1.zip file, which is the first one labeled “2. Data” (see the screenshot below). Clicking on the big rectangular button labeled “Download” on the right side of this line will download the file.

Download

How can I use the SWIID data to make my own graphs?

The SWIID download contains files pre-formatted for use with the tools for analyzing multiply-imputed data and other data measured with uncertainty in Stata (swiid6_0.dta) or R (swiid6_0). This is meant to “set the default” in a way that encourages researchers to take into account the uncertainty in the SWIID estimates. This uncertainty is considerable in many developing countries, and the tools available in both software packages can now handle pretty much any analysis one may desire (for details, read the documents R_swiid.pdf and stata_swiid.pdf in the SWIID download). But this format does not lend itself to graphing. For this purpose, use the swiid6_0_summary.csv file, which presents the SWIID estimates in mean-plus-standard-error summary format.

In Stata:

 import delimited "swiid6_0_summary.csv", clear
 
// Calculate the bounds of the 95% uncertainty intervals
gen gini_disp_95ub = gini_disp + 1.96*gini_disp_se
gen gini_disp_95lb = gini_disp - 1.96*gini_disp_se

// A silly example
gen name_length = length(country)
gen first_letter = substr(country, 1, 1)
keep if year==2010 & first_letter=="S" /*2010 for Senegal, Serbia, . . .*/

// A scatterplot with 95% confidence intervals
twoway rspike gini_net_95ub gini_net_95lb name_length, lstyle(ci) || ///
    scatter gini_net name_length, msize(small) legend(order(2 "SWIID Net-Income Inequality")) 

In R:

library(tidyverse)

# Load the SWIID
load("swiid6_0.rda")

# Plot SWIID gini_disp estimates for the United States
swiid_summary %>% 
    filter(country == "United States") %>% 
    ggplot(aes(x=year, y=gini_disp)) + 
    geom_line() +
    geom_ribbon(aes(ymin = gini_disp-1.96*gini_disp_se,
                    ymax = gini_disp+1.96*gini_disp_se, 
                    linetype=NA), alpha = .25) +
    scale_x_continuous(breaks=seq(1960, 2015, 5)) +
    theme_bw() + 
    labs(x = "Year", 
         y = "SWIID Gini Index, Disposable Income",
         title = "Income Inequality in the United States")

# Plot SWIID gini_net estimates for the United States
swiid_summary %>% 
    filter(country == "United States") %>% 
    ggplot(aes(x=year, y=gini_net)) + 
    geom_line() +
    geom_ribbon(aes(ymin = gini_net-1.96*gini_net_se,
                    ymax = gini_net+1.96*gini_net_se, 
                    linetype=NA), alpha = .25) +
    scale_x_continuous(breaks=seq(1960, 2015, 5)) +
    theme_bw() + 
    labs(x = "Year", 
         y = "SWIID Gini Index, Net Income",
         title = "Income Inequality in the United States")

I'm using an older version of Stata, and it won’t open the file. How can I use the SWIID?

Stata 13 introduced a new file format which older versions of Stata can’t open. Fortunately, there’s an easy fix: the use13 command. In the command window, first type ssc install use13 to install it. Then you can type use13 SWIIDv5_0.dta, clear to load the SWIID data.

Update: Version 5.1 is saved in the older file format, so this shouldn’t be a problem anymore.

How can I merge my other data into the SWIID using country codes?

First, add whatever country codes are needed to the SWIID data: there are routines for both Stata (kountry; type findit kountry in Stata’s command window to install) and R (countrycode, on CRAN) to generate many commonly used country codes. Then follow the instructions for merging data into the SWIID found in the “Using the SWIID.pdf” file included in the data download.

How can I use the SWIID data with Stata’s time-series operators?

To use Stata’s time-series operators (l. , f. , d. , etc.), you must first declare the time and panel variables using tsset , but the SWIID is already mi set , so tsset won’t work. Run the following before merging the rest of your data (called my_data.dta in this example) into the SWIID:

use SWIIDv5_1.dta, clear
egen cc = group(country)  
mi tsset cc year
merge 1:1 country year using "my_data.dta", keep(match master) 

Now you’ll be able to use l. and the rest of Stata’s time-series operators.


The SWIID Data

How is Germany treated in the SWIID? Where’s East Germany?

As in the LIS, Germany is West Germany only before 1991 and (united) Germany from that point forward. The SWIID includes no estimates for East Germany; I just haven’t found much to include in the source data. If you know of a source for East German Ginis, email me with the details, and I’ll be happy to get it into the next version.

What about Russia and the rest of the former Soviet Union? Yugoslavia? Czechoslovakia? Pakistan? Sudan? Ethiopia? Why are there estimates for successor states before they actually existed?

For countries that have undergone partition, the SWIID estimates for a given year include all of the origin country’s then-current territories. This means that for dates before partition, all of the once united territory is included: estimates for Sudan before 2011, for example, include present-day South Sudan.

Where Ginis are available from before the breakup for the territory of successor states, these Ginis have been standardized like anything else in the source data (the SWIID routine does not make any effort to incorporate information from the origin state into estimates for the successor state). For example, the estimates for Russia for dates before the breakup of the Soviet Union are for the territories of the USSR that would become Russia after the breakup; there are similar pre-breakup data available for many FSU states, the Czech Republic and Slovakia, and the successor states of the former Yugoslavia. The resulting estimates are available for use when appropriate, that is, to match the conventions of the other data one may be employing.

The estimates for Ruritania, among others, have changed since the previous version. Why is that?

There are three reasons why estimates change from one version of the SWIID to the next. The first reason is revisions to the Luxembourg Income Study. As the LIS serves as the standard for SWIID estimates, when Ginis from the LIS change, the SWIID estimates change as well.

The second reason for changes is the addition of more source data for the country. Each new version of the SWIID includes new source data both for (a) years that were not previously available, and (b) earlier years from sources I’ve discovered since the last update. Adding more source data for a country, particularly when previously available data was scant, holds the potential that the estimates will change.

The second, less obvious reason for changes is the availability of more raw data for other countries. The SWIID estimates for many countries that a relatively data-poor depend at least in part on information from other countries in their region or, for some developing countries, even on other developing countries in other regions (see Figure 4 and discussion in Solt 2016). This means that, given the scarcity of data on Ruritania, more data—especially more LIS data—for other countries in Ruritania’s region can lead to substantial changes in the estimates for Ruritania.