Updates and Working Notes

SWIID Version 5.1 is available!

Thursday, 21 July 2016

Version 5.1 of the SWIID is now available! It revises and updates the SWIID’s source data and estimates. It also includes expanded training modules explaining how to take into account the uncertainty in the estimates in both R and Stata.

As always, I encourage users of the SWIID to email me with their comments, questions, and suggestions.

Use dotwhisker for your APSA slides!

Thursday, 30 July 2015

With the APSA coming up, and in the interest of minimizing the number of times we hear “sorry, I know you won’t really be able to see these regression coefficients,” I thought I’d point R users to dotwhisker, a package UI Ph.D. student Yue HU and I just published to CRAN. dotwhisker makes regression plots in the style of Kastellec and Leoni’s (2007) Perspectives article quick and easy: after data entry, just two lines of R code produced the easy-to-read-even-from-the-back-of-the-room plot attached to this post. I hope you’ll find it useful, and if you have any suggestions for us, that you’ll file an issue at https://github.com/fsolt/dotwhisker, tweet to me @fredericksolt, or just send me an email frederick-solt@uiowa.edu.


Now on CRAN: interplot

Friday, 26 June 2015

Hu Yue and I just published interplot on CRAN, our first R package. interplot makes graphing the coefficients of variables in interaction terms easy. It outputs ggplot objects, so further customization is simple. Check out the vignette and give it a try!


Inequality in China

Friday, 27 March 2015

A new working paper by IMF researchers Serhan Cevik and Carolina Correa-Caro observes that sharply rising inequality has made China one of the most unequal countries in the world. Here’s a graph of SWIIDv5.0 data that illustrates their point.


SWIID Version 5.0 is available!

Thursday, 2 October 2014

Version 5.0 of the SWIID is now available, and it is a major update. A new article of record (currently available as a working paper while under peer review) reviews the problem of comparability in cross-national income inequality data, explains how the SWIID addresses the issue, assesses the SWIID’s performance in comparison to the available alternatives, and explains how to use the SWIID data in cross-national analyses.

The new version also marks the debut of the SWIID web application. The web application allows users to graph the SWIID estimates of any of net-income income, market-income inequality, relative redistribution, or absolute redistribution in as many as four countries or to compare these measures within a single country. Its output can be downloaded with a click for use in reports or articles. I hope that it will be of particular value to policymakers, journalists, students, and others who need to make straightforward comparisons of levels and trends in income inequality.

As always, I encourage users of the SWIID to email me with their comments, questions, and suggestions.

SWIID Version 4.0 is available!

Monday, 30 September 2013

Version 4.0 of the SWIID is now available here. Drawing on nearly 14,000 Gini observations in more than 3100 country-years, this version provides even better estimates of income inequality in countries around the world than in previous versions.

This version introduces two other improvements. First, many users have had trouble making appropriate use of the standard errors associated with the SWIID estimates. The uncertainty, however, can sometimes be substantial, making it crucial to incorporate in one’s analyses. Fortunately, there are now tools in Stata and R that make it quite straightforward to analyze data that is measured with error, and this version of the SWIID includes files that are pre-formatted for use with these tools. The file “Using the SWIID.pdf”, which is also included in the data download, explains how. Some additional examples of using the SWIID with Stata’s mi estimate command prefix can be found towards the end of the slides posted here.

Second, I’ve received several requests for measures of top income share, so in this version I am including estimates of the top 1 percent’s share (the variable share1), standardized to the data provided in the World Top Incomes Database: Country-years included in that dataset are reproduced without modification in the SWIID, and comparable figures for other country-years are estimated using the SWIID’s custom multiple-imputation algorithm. Like all inequality datasets, Top Incomes has tradeoffs—among other things, the share of pre-tax, pre-transfer income reported on tax returns by the richest filers may not be of much theoretical interest to many investigators—but the additional estimates the SWIID provides may prove to be useful to some.

I encourage users of the SWIID to email me with their comments, questions, and suggestions.

My talk at the UN

Sunday, 29 September 2013

Earlier this month, I gave a talk previewing Version 4.0 of the SWIID to the Development Policy and Analysis Division of the United Nations’ Department of Economic and Social Affairs. I had some great conversations and got lots of useful feedback. Slides for the talk can be found here.

SWIID Version 3.1 now available!

Monday, 2 January 2012

Version 3.1 of the SWIID is now available here. The primary difference introduced in Version 3.1 is that the data on which the SWIID is based have again been expanded. Now nearly 4500 Gini observations are added to those collected in the UNU-WIDER data, and for many countries the available data extend to 2010. Also, I made one semantic change: to try to avoid confusion among those who neglect to read about the data they use, the series on pre-tax, pre-transfer inequality is now labeled gini_market rather than gini_gross. Otherwise, very small revisions were made to the SWIID routine from Version 3.0. As always, I encourage users of the SWIID to email me with their comments, questions, and suggestions.

SWIID Version 3.0 is now available!

Sunday, 11 July 2010

Version 3.0 of the SWIID is now available, with expanded coverage and improved estimates.

The data on which the SWIID is based have been expanded. I have collected another 2100 Gini observations (in addition to the 1500 added in v2.0), again with special attention to addressing the thinner spots in the WIID. As before, these data are available in the replication materials for those who are interested. Major sources for these data include the World Bank’s Povcalnet, the Socio-Economic Database for Latin America, Branko Milanovic’s World Income Distribution data (“All the Ginis”), and the ILO’s Household Income and Expenditure Statistics, but a multitude of national statistical offices and other sources were also consulted.

The SWIID also now incorporates the University of Texas Inequality Project’s UTIP-UNIDO dataset on differences in pay across industrial sectors. Across countries and years, these data explain only about half of the variation in net income inequality (and much less of gross income inequality) and so yield predictions with prohibitively large standard errors when employed in this way, but where there was sufficient data available, I used the UTIP data to make within-country loess predictions of both net and gross income inequality that informed the SWIID estimates.

The imputation routine used for generating the SWIID was cleaned up: the code now runs more efficiently, and a few errors were corrected.

Many researchers have asked me about using the SWIID to examine questions of redistribution, so I now include in the dataset the percentage reduction in gross income inequality (that is, the difference between the gross and net income inequality, divided by gross income inequality, multiplied by 100) as an estimate of redistribution (“redist”) as well as its associated standard error (“redist_se”). The standard errors for redistribution are particularly important to take into account, as they can often be quite large relative to the size of the estimates. Observations for redistribution are omitted for countries for which the source data do not include multiple observations of either net or gross income inequality: in such cases, although the two inequality series each still constitute the most comparable available estimates, the difference between them reflects only information from other countries, and treating it as meaningful independent information about redistribution would be unwise. Similarly, because the underlying data is often thin in the early years included in the SWIID, redistribution is only reported after 1975 for most of the advanced countries and only after 1985 for most countries in the developing world.

As always, I encourage users of the SWIID to email me with their comments, questions, and suggestions.

Using the SWIID Standard Errors

Sunday, 20 June 2010

Incorporating the standard errors in the SWIID estimates into one’s analyses is the right thing to do, but it is not a trivial exercise. I myself have left it out of some work where I felt the model was already maxed out on complexity (though in such cases, I advise at least excluding observations with particularly large errors). The short story is that one generates a bunch of Monte Carlo simulations of the SWIID data from the estimates and standard errors, then analyses each simulation, then combines the results of the multiple analyses as one would in a multiple-imputation setup (this should be easier to do with Stata 11’s new multiple-imputation tools, but I won’t get my copy of Stata 11 until the fall–oh well). The code below does the trick.

**Using the SWIID Standard Errors: An Example**
//Load SWIID and generate fake data for example
use "SWIIDv2_0.dta", clear
set seed 4533160
gen x1 = 20*rnormal()
gen x2 = rnormal()
gen x3 = 3*rnormal()
gen y = .03*x1 + 3*x2 + .5*x3 + .05*gini_net + 5 + 20*rnormal()
reg y x1 x2 x3 gini_net

//Generate ten Monte Carlo simulations of the gini_net series
egen ccode=group(country)				
tsset ccode year						
set seed 3166							
forvalues a = 1/10 {
	gen e0 = rnormal()
	quietly tssmooth ma e00 = e0, weight (1 1 <2> 1 1)
	quietly sum e00
	quietly gen g`a'=gini_net+e00*(1/r(sd))*gini_net_se
	drop e0 e00

//Perform analysis using each of the ten simulations, saving the results
local other_ivs = "x1 x2 x3"		/*to be replaced with your other IVs, that is, not including gini_net or the constant*/
local n_ivs = 5				/*to be replaced with the number of IVs, now *including* gini_net and the constant*/
matrix coef = J(`n_ivs', 10, -99)
matrix se = J(`n_ivs', 10, -99)
matrix r_sq = J(1, 10, -99)
forvalues a = 1/10 {
	quietly reg y `other_ivs' g`a'	/*to be replaced with your analysis*/	
	matrix coef[1,`a'] = e(b)'
	matrix A = e(V)
	forvalues b = 1/`n_ivs' {
			matrix se[`b', `a'] = (A[`b',`b'])
	matrix r_sq[1, `a'] = e(r2)

local cases = e(N)

svmat coef, names(coef)
svmat se, names(se)
svmat r_sq, names(r_sq)

//Display results across all simulations
egen coef_all = rowmean(coef1-coef10)

gen ss_all = 0
forvalues a = 1/10 {
	quietly replace ss_all = ss_all + (coef`a'-coef_all)^2
egen se_all = rowmean(se1-se10)
replace se_all = se_all + (((1+(1/10)) * ((1/9) * ss_all))) /*Total variance, per Rubin (1987)*/
replace se_all = (se_all)^.5 /*Total standard error*/

gen t_all = coef_all/se_all
gen p_all = 2*normal(-abs(t_all))

egen r_sq_all = rowmean(r_sq1-r_sq10)

gen vars = " " in 1/`n_ivs'
local i = 0
foreach iv in `other_ivs' "Inequality" "Constant" {
	local i = `i'+1
	replace vars = "`iv'" in `i'
mkmat coef_all se_all p_all if coef_all~=., matrix(res_all) rownames(vars)
matrix list res_all, format(%9.3f)
quietly sum r_sq_all
local r2 = round(`r(mean)', .001)
di "R-sq = `r2'"
di "N = `cases'"

Please feel free to drop me an email if you have any questions or comments.

SWIID Version 2.0

Friday, 31 July 2009

Version 2.0 of the SWIID is now available, and it is a major upgrade. It introduces two important changes from Version 1.1 (the version described in the SSQ article). First, I collected a large number (1500+) of Gini observations that are excluded from the WIID with an eye towards addressing some of the thinner spots in the SWIID’s underlying data. Second, I rewrote several parts of the missing-data algorithm. The key change is a switch from multilevel to (flat) linear regression modeling for the imputation of conversion ratios between the 21 categories of available Gini data. Given the patterns of missingness in the data, complete pooling (as occurs in a flat linear regression) proved superior to partial pooling (as occurs in multilevel modeling). The result, along with some minor improvements in coverage, is considerably smaller standard errors in the Gini index estimates, particularly in Latin America and Africa, than in Version 1.1. All SWIID users are encouraged to use these new data in their work.

SWIID Version 1.1

Sunday, 12 October 2008

So much for version control. With apologies to v1.0 users, Version 1.1 is the SWIID as reported in “Standardizing the World Income Inequality Database.”

SWIID Version 1.0

Saturday, 13 September 2008

“Standardizing the World Income Inequality Database” has been accepted for publication in the Social Science Quarterly. Version 0.9 of the SWIID is now released as Version 1.0 without modification.

SWIID Version 0.9

Tuesday, 5 August 2008

The SWIID is currently undergoing peer review for publication.