Updates and Working Notes
My Workflow
note
Today there was a workshop on at the Cluster of Excellence Politics of Inequality on research workflows with presentations by Maarten Buis and me. I was excited for the invitation to speak at the workshop, because I think how we do our work is important: having a good workflow helps us to be more efficient, minimize errors, and so make our work more credible, which are all things to strive for. Here are the elements of my workflow that I talked about.
How to Switch Your Workflow from Stata to R, One Bit at a Time
note
r
A recent exchange on Twitter reminded me of my switch to R from Stata. I’d started grad school in 1999, before R hit 1.0.0, so I’d been trained exclusively in Stata. By 2008, I had way more than the proverbial 10,000 in-seat hours in Stata, and I knew all the tricks to make it do just what I wanted. I was even Stata Corp.’s on-campus rep at my university. Still, I’d started dabbling in R. Then as now, there were specific things R could do that Stata couldn’t.1 But how to get those advantages without throwing out my hard-earned skills and starting over as a complete n00b? The answer was: a little bit at a time.
Use dotwhisker for your APSA slides!
note
dotwhisker
r
With the APSA coming up, and in the interest of minimizing the number of times we hear “sorry, I know you won’t really be able to see these regression coefficients,” I thought I’d point R users to dotwhisker, a package UI Ph.D. student Yue HU and I just published to CRAN. dotwhisker makes regression plots in the style of Kastellec and Leoni’s (2007) Perspectives article quick and easy: after data entry, just two lines of R code produced the easy-to-read-even-from-the-back-of-the-room plot attached to this post. I hope you’ll find it useful, and if you have any suggestions for us, that you’ll file an issue at https://github.com/fsolt/dotwhisker, tweet to me @fredericksolt, or just send me an email [email protected]
SWIID Version 5.0 is available!
update
swiid
Version 5.0 of the SWIID is now available, and it is a major update. A new article of record (currently available as a working paper while under peer review) reviews the problem of comparability in cross-national income inequality data, explains how the SWIID addresses the issue, assesses the SWIID’s performance in comparison to the available alternatives, and explains how to use the SWIID data in cross-national analyses.
SWIID Version 3.1 now available!
update
swiid
Version 3.1 of the SWIID is now available here. The primary difference introduced in Version 3.1 is that the data on which the SWIID is based have again been expanded. Now nearly 4500 Gini observations are added to those collected in the UNU-WIDER data, and for many countries the available data extend to 2010. Also, I made one semantic change: to try to avoid confusion among those who neglect to read about the data they use, the series on pre-tax, pre-transfer inequality is now labeled
gini_market
rather than gini_gross
. Otherwise, very small revisions were made to the SWIID routine from Version 3.0. As always, I encourage users of the SWIID to email me with their comments, questions, and suggestions.
Using the SWIID Standard Errors
note
swiid
Incorporating the standard errors in the SWIID estimates into one’s analyses is the right thing to do, but it is not a trivial exercise. I myself have left it out of some work where I felt the model was already maxed out on complexity (though in such cases, I advise at least excluding observations with particularly large errors). The short story is that one generates a bunch of Monte Carlo simulations of the SWIID data from the estimates and standard errors, then analyses each simulation, then combines the results of the multiple analyses as one would in a multiple-imputation setup (this should be easier to do with Stata 11’s new multiple-imputation tools, but I won’t get my copy of Stata 11 until the fall–oh well). The code below does the trick.
SWIID Version 2.0
update
swiid
Version 2.0 of the SWIID is now available, and it is a major upgrade. It introduces two important changes from Version 1.1 (the version described in the SSQ article). First, I collected a large number (1500+) of Gini observations that are excluded from the WIID with an eye towards addressing some of the thinner spots in the SWIID’s underlying data. Second, I rewrote several parts of the missing-data algorithm. The key change is a switch from multilevel to (flat) linear regression modeling for the imputation of conversion ratios between the 21 categories of available Gini data. Given the patterns of missingness in the data, complete pooling (as occurs in a flat linear regression) proved superior to partial pooling (as occurs in multilevel modeling). The result, along with some minor improvements in coverage, is considerably smaller standard errors in the Gini index estimates, particularly in Latin America and Africa, than in Version 1.1. All SWIID users are encouraged to use these new data in their work.
No matching items