gesisdata: Reproducible Retrieval of GESIS Data Archive Datasets
Frederick Solt
2023-07-03
Source:vignettes/gesisdata-vignette.Rmd
gesisdata-vignette.Rmd
The GESIS Data Archive makes available thousands of invaluable social scientific surveys, including, among many others, the ALLBUS, the European Values Survey, the Eurobarometer, and the International Social Survey Program. Researchers taking advantage of these datasets, however, are caught in a bind. The archive’s terms and conditions bar dissemination of downloaded datasets to third parties. But to ensure that one’s work can be reproduced, assessed, and built upon by others, one must provide access to the raw data one employed.
The gesisdata
package cuts this knot by providing
programmatic, reproducible access to specified GESIS datasets from
within R for registered
users.
Setup
When used interactively, the gesis_download
function
will ask for the login information required by GESIS: the registered
user’s email, password, and the use to which the data will be put. After
that information is input once, it will be entered automatically for any
other download requests made in the same session. To change this contact
information within a session, one may set the argument
reset
to TRUE
when running
gesis_download
again, and the function will again request
the required information.
An optional, but highly recommended, setup step is to add the information that GESIS requires to your .Rprofile as in the following example:
options("gesis_email" = "[email protected]",
"gesis_password" = "password123!",
"gesis_use" = 5)
The gesis_download
function will then access the
information it needs to pass on to GESIS by default. This means that
researchers will not have to expose their info in their R scripts and
that others reproducing their results later—given that they have
registered as users with GESIS—will be able to execute those R scripts
without modification. (They will, however, need to enter their own
information either interactively or in their own .Rprofiles, a detail
that should be noted in the reproducibility materials to avoid
confusion.)
Use
The gesis_download
function (1) simulates a visit to the
GESIS log-in page, (2) enters the required information to sign in, (3)
navigates to a specified dataset and downloads the dataset’s files, and,
optionally but by default, (4) converts the dataset’s Stata-formated
files to .Rdata
format.
Datasets are specified using the file_id
argument. GESIS
uses a unique number to identify each of its datasets. This number
consistently appears alongside the dataset’s name; it is also listed as
the “Study Number,” and it can be found at the end of the dataset’s URL
as well. For the cumulative
Politbarometer, for example, the file id is ZA5100:
To reproducibly download this dataset:
gesis_download(file_id = "ZA5100",
download_dir = tempdir()) # remember to specify a directory for your download
Multiple datasets may be downloaded from the same research area in a
single command by passing a vector of ids to file_id
. The
following downloads the Politbarometer cumulative file along with
Eurobarometer 84.4 and the ISSP Role of Government V:
gesis_download(file_id = c("ZA5100", "ZA6644", "ZA6900"),
download_dir = tempdir()) # remember to specify a directory for your download
After the needed datasets (and, if available, their codebooks) are
downloaded, the datasets are by default converted to .RData
format and ready to be loaded into R using load()
or
rio::import()
.