SWIID Frequently Asked Questions

How to . . .

Wait, how can I download the data again?

First, click on this link. Scroll down past the description until you see a list of files. The dataset is the SWIIDv5_1.zip file, which is the first one labeled “2. Data” (see the screenshot below). Clicking on the big rectangular button labeled “Download” on the right side of this line will download the file.


How can I use the SWIID data to make my own graphs? Or to get the data in the old mean-plus-standard-error format?

Since Version 5.0, the SWIID is distributed only pre-formatted for use with the tools for analyzing multiply-imputed data in Stata or R. This is meant to “set the default” in a way that encourages researchers to take into account the uncertainty in the SWIID estimates; this uncertainty is considerable in many developing countries. This decision does mean, however, that one will have to take the extra step of summarizing the multiple imputations when they aren’t needed, e.g., for graphing, or when doing analyses that the multiple-imputation tools can’t handle (though in such circumstances, one should limit one’s sample to those observations with relatively small standard errors; see Solt 2009, 238).

In Stata:

use SWIIDv5_1.dta, clear
// Summarize the dataset
keep country year _*

foreach v in gini_net gini_market rel_red abs_red {
    egen `v' = rowmean(_*`v')
    egen `v'_se = rowsd(_*`v')
    gen `v'_95ub = `v' + 1.96*`v'_se
    gen `v'_95lb = `v' - 1.96*`v'_se
drop _*
sort country year

// A silly example
gen name_length = length(country)
gen first_letter = substr(country, 1, 1)
keep if year==2010 & first_letter=="S" /*2010 for Senegal, Serbia, . . .*/

// A scatterplot with 95% confidence intervals
twoway rspike gini_net_95ub gini_net_95lb name_length, lstyle(ci) || ///
    scatter gini_net name_length, msize(small) legend(order(2 "SWIID Net-Income Inequality")) 

In R:


# Load the SWIID

# Summarize the SWIID
swiid_summary <- swiid %>% 
    bind_rows() %>% 
    group_by(country, year) %>% 
    summarize_all(funs(mean, sd)) %>%
    ungroup() %>% 
                           str_replace(names(.), "_mean", ""))) %>% 
                           str_replace(names(.), "_sd", "_se")))

# Plot SWIID gini_net estimates for the United States
swiid_summary %>% 
    filter(country == "United States") %>% 
    ggplot(aes(x=year, y=gini_net)) + 
    geom_line() +
    geom_ribbon(aes(ymin = gini_net-1.96*gini_net_se,
                    ymax = gini_net+1.96*gini_net_se, 
                    linetype=NA), alpha = .25) +
    scale_x_continuous(breaks=seq(1960, 2015, 5)) +
    theme_bw() + 
    labs(x = "Year", 
         y = "SWIID Gini Index, Net Income",
         title = "Income Inequality in the United States")

I'm using an older version of Stata, and it won’t open the file. How can I use the SWIID?

Stata 13 introduced a new file format which older versions of Stata can’t open. Fortunately, there’s an easy fix: the use13 command. In the command window, first type ssc install use13 to install it. Then you can type use13 SWIIDv5_0.dta, clear to load the SWIID data.

Update: Version 5.1 is saved in the older file format, so this shouldn’t be a problem anymore.

How can I merge my other data into the SWIID using country codes?

First, add whatever country codes are needed to the SWIID data: there are routines for both Stata (kountry; type findit kountry in Stata’s command window to install) and R (countrycode, on CRAN) to generate many commonly used country codes. Then follow the instructions for merging data into the SWIID found in the “Using the SWIID.pdf” file included in the data download.

How can I use the SWIID data with Stata’s time-series operators?

To use Stata’s time-series operators (l. , f. , d. , etc.), you must first declare the time and panel variables using tsset , but the SWIID is already mi set , so tsset won’t work. Run the following before merging the rest of your data (called my_data.dta in this example) into the SWIID:

use SWIIDv5_1.dta, clear
egen cc = group(country)  
mi tsset cc year
merge 1:1 country year using "my_data.dta", keep(match master) 

Now you’ll be able to use l. and the rest of Stata’s time-series operators.

The SWIID Data

How is Germany treated in the SWIID? Where’s East Germany?

As in the LIS, Germany is West Germany only before 1991 and (united) Germany from that point forward. The SWIID includes no estimates for East Germany; I just haven’t found much to include in the source data. If you know of a source for East German Ginis, email me with the details, and I’ll be happy to get it into the next version.

What about Russia and the rest of the former Soviet Union? Yugoslavia? Czechoslovakia? Pakistan? Sudan? Ethiopia? Why are there estimates for successor states before they actually existed?

For countries that have undergone partition, the SWIID estimates for a given year include all of the origin country’s then-current territories. This means that for dates before partition, all of the once united territory is included: estimates for Sudan before 2011, for example, include present-day South Sudan.

Where Ginis are available from before the breakup for the territory of successor states, these Ginis have been standardized like anything else in the source data (the SWIID routine does not make any effort to incorporate information from the origin state into estimates for the successor state). For example, the estimates for Russia for dates before the breakup of the Soviet Union are for the territories of the USSR that would become Russia after the breakup; there are similar pre-breakup data available for many FSU states, the Czech Republic and Slovakia, and the successor states of the former Yugoslavia. The resulting estimates are available for use when appropriate, that is, to match the conventions of the other data one may be employing.

The estimates for Ruritania, among others, have changed since the previous version. Why is that?

There are three reasons why estimates change from one version of the SWIID to the next. The first reason is revisions to the Luxembourg Income Study. As the LIS serves as the standard for SWIID estimates, when Ginis from the LIS change, the SWIID estimates change as well.

The second reason for changes is the addition of more source data for the country. Each new version of the SWIID includes new source data both for (a) years that were not previously available, and (b) earlier years from sources I’ve discovered since the last update. Adding more source data for a country, particularly when previously available data was scant, holds the potential that the estimates will change.

The second, less obvious reason for changes is the availability of more raw data for other countries. The SWIID estimates for many countries that a relatively data-poor depend at least in part on information from other countries in their region or, for some developing countries, even on other developing countries in other regions (see Figure 4 and discussion in Solt 2016). This means that, given the scarcity of data on Ruritania, more data—especially more LIS data—for other countries in Ruritania’s region can lead to substantial changes in the estimates for Ruritania.