My Workflow

note
Author

Frederick Solt

Today I spoke with the early-career researchers at the Politics of Inequality Cluster of Excellence at the University of Konstanz about my workflow. I was excited for the invitation, because I think how we do our work is important: having a good workflow helps us to be more efficient and to minimize errors, which are both things to strive for. Here’s what I talked about.

Backup

Sometime in 2007 or 2008, I had a hard drive fail in my main work machine. I bought some software to recover the information on the drive, which seemed to work perfectly. Only months (maybe even years) later did I realize that virtually all of my files for a big project had been lost. On the one hand, the paper was already forthcoming, so the years of effort weren’t entirely in vain. On the other, all that stuff was gone, and I’d never be able to build on it. Here well over a decade later, it still makes me shudder.

So I got serious about backup. First, I use Resilio Sync to keep copies of all the files in my Documents directory on my laptop on another machine that’s in a closet at home. That machine uses Apple’s Time Machine to back up those files again to an external drive and it uses Acq Backup to also back them up to Amazon Glacier Storage.1

Drives are much more robust today, but bad things—fires, floods, a lost computer bag—can still happen. Back your stuff up! There are lots of options in this space; choose one (or, like me, a couple) and set them up to run automatically. Then you can check in on those processes once in a while (the weak point in mine is that the flaky power grid ends up shutting down the closet computer once in a while) but mostly never worry about it again. Your career deserves this insurance policy.

Reference Management

Your absolute quickest workflow payoff is to adopt reference management software for bibliographies. Hand typing bibliographies takes a lot of time, it’s no fun, you’re going to make mistakes, and the computer is happy to do it for you. Perfect. Moreover, having your own library of relevant work makes it super easy to find and pull up that paper you can’t quite remember the details about. I recommend this to junior people frequently, and I’m always surprised at how reluctant many, even most, of them are, but putting a little time in on this at the front end saves a lot later.

Starting as an undergrad and then through graduate school, I used Endnote, but I moved to Bib\(\TeX\) shortly after finishing my Ph.D., when I started writing in \(\LaTeX\) rather than Word. I have used BibDesk as my Bib\(\TeX\) front end since then; like most of my workflow, it is open source, but unfortunately, it is Mac-only software. JabRef is the standard on other platforms. I keep thinking about shifting to Zotero for this. Maybe someday that will happen.

Anyway, whenever I read anything I think I might use, I download its Bib\(\TeX\) file from the journal2 and the pdf. I take a second right then to double-check that all of the information is accurate—that the authors’ full first names are there rather than just an initial, etc.—and then I never have to even think of typing that stuff ever again.

Statistical Analysis

I was trained in Stata and was totally into it for many years (and I mean totally: I repeatedly paid the seriously big bucks for the 8MP version, and I was campus rep for the company at my university). But since maybe 2012, after three or four years of transition, I have used R more or less exclusively, and I use RStudio Desktop as an IDE. As I’ve written elsewhere, I’d say R’s advantages are that R is free and open source, it has superior graphical capabilities, “the super-helpful community, the transferable job skills you can teach your students, the free part, the cutting-edge stuff available years before it’s in Stata, the way RStudio makes it dead easy to do reproducible research through dynamic documents and version control, and, once again, the free part.” I adopted the tidyverse as it grew and teach it to my students, too—in my opinion, it really makes R easier to learn and use. Other packages that seemingly make it into everything I write are countrycode, which makes merging cross-national datasets much easier, dotwhisker, which I wrote with Hu Yue to draw regression plots along the lines proposed by Kastellec and Leoni (2007), and more recently, DCPOtools, for working with large numbers of survey datasets to estimate dynamic comparative public opinion.

Literate Programming

Literate programming refers to having all of your statistical analysis and all of your writing together in a single document that re-runs your analyses every time you compile the document, so that any changes in, say, your data get automatically reflected into changes in your figures, tables, and even descriptions in the text. I’m a huge fan—it’s super efficient and, done well, it ensures your work is always reproducible. To take advantage of this, I moved from straight \(\LaTeX\) to Sweave, which embeds R code in \(\LaTeX\), and then switched to RMarkdown, which as the name suggests is R code in the much simpler and more intuitive Markdown language, when it came out. RStudio recently released Quarto, which is a super-powered update to RMarkdown, and I guess I’ll make that switch for my next new paper.

Version Control

Version control is used to track changes in documents over time, particularly documents that are code-heavy. I use git and GitHub for this purpose. Like backups, version control is a very good thing, and because all of your work is in the cloud, it facilitates collaboration, too—I work with a team spread over eleven time zones, and GitHub makes it easy to have conversations about what we are doing and to keep our work integrated.3 RStudio makes all of this super easy; see Jenny Bryan’s Happy Git with R for how to get set up.

Large Dataset Storage

Sometimes I need to save datasets somewhere online so that they can be downloaded by my literately-programmed papers. That’s when I turn to the Open Science Foundation and start a public project for the paper. As long as the project is public, you can store even really big files there for free.

Slides

For a while a decade or more ago now, I used Beamer to write my presentation slides in \(\LaTeX\), but I had always used Apple’s Keynote for teaching, and I eventually came back to the idea that Keynote was the way to go for everything. I love the fun things that Xaringan made possible for RMarkdown slides, and now Quarto offers similar capabilities, and I sometimes wish for the ease-of-updating that comes with literate programming, but given that most of my presentation slide decks are used just once, the advantages of drag-and-drop image positioning and WYSIWYG formatting have won. For now.

Course Management

The University of Iowa is now a Canvas shop, and so I use Canvas for almost all of my teaching more or less by default. I recorded all of my lectures in Panopto, but I sometimes think I should have just put everything on YouTube to avoid being locked into that company’s format. That was a decision made in the heat of the pandemic, so it may be revisited at some point. I also have one graduate course, on Computational Methods for Comparative Politics, that I teach with Github Education, which I like a lot. I’d be tempted to move more of my teaching there, but Canvas integration with the university’s enrollment system and so on win out.

Websites

Having a website is a crucial part of getting yourself and your work known by the broader scholarly community. My very first website is recorded for posterity at the Internet Archive’s Wayback Machine, which is also a tool I end up using pretty often to prevent link rot in scripts and course assignments. Anyway, after using Jekyll and then blogdown, I now use Quarto to write my website and GitHub Pages to publish it.

Preprints

When I finish a draft and it’s ready to send out for review, right at the same time I upload the paper to the Open Science Foundation and submit it to SocArXiv. Having this preprint on SocArXiv makes the paper really easy for me to share (and for others to find and read and use) while it is under review. And after acceptance, it ensures that paywalls won’t block people from reading the work.

Replication Materials

Acceptance is also the time for me to get my replication materials online at the Harvard Dataverse. Many journals now require this, but even if you publish at a journal that doesn’t, it is good to do anyway: it increases the work’s visibility and its citations (King 1995). I also use the Dataverse to host the twice-annual updates to the SWIID.

Social Networking

For me, the last step is then to announce the publication on Twitter. Some people are really good at using Twitter to collect and test research ideas and intermediate products (e.g., Alice Evans)—real time and continuous peer review—and you should consider trying that, too.

References

Kastellec, Jonathan P., and Eduardo L. Leoni. 2007. Using Graphs Instead of Tables in Political Science.” Perspectives on Politics 5(4): 755–71.
King, Gary. 1995. “Replication, Replication.” PS: Political Science and Politics 28(3): 444–52.

Footnotes

  1. Also, all my research is on GitHub, too, but I’ll come to that in a minute.↩︎

  2. If it’s a book, I use BibDesk’s search of the U.S. Library of Congress↩︎

  3. In fairness, we also meet weekly on Zoom for a couple of hours.↩︎