Skip to contents

read_ascii helps format ASCII data files downloaded from the Roper Center.

Usage

read_ascii(
  file,
  total_cards = 1,
  var_names,
  var_cards = 1,
  var_positions,
  var_widths,
  card_pattern,
  respondent_pattern
)

Arguments

file

A path to an ASCII data file.

total_cards

For multicard files, the number of cards in the file.

var_names

A string vector of variable names.

var_cards

For multicard files, a numeric vector of the cards on which var_names are recorded.

var_positions

A numeric vector of the column positions in which var_names are recorded.

var_widths

A numeric vector of the widths used to record var_names.

card_pattern

For use when the file does not contain a line for every card for every respondent (or contains extra lines that correspond to no respondent), a regular expression that matches the file's card identifier; e.g., if the card number is stored in the last digit of each line, "\d$".

respondent_pattern

For use when the file does not contain a line for every card for every respondent (or contains extra lines that correspond to no respondent), a regular expression that matches the file's respondent identifier; e.g., if the respondent number is stored in the first four digits of each line, preceded by a space, "(?<=^\s)\d4".

Value

A data frame containing any variables specified in the var_names argument, plus a numeric respondent identifier and as many string card variables (card1, card2, ...) as specified by the total_cards argument.

Details

Many older Roper Center datasets are available only in ASCII format, which is notoriously difficult to work with. The `read_ascii` function facilitates the process of extracting selected variables from ASCII datasets. For single-card files, one can simply identify the names, positions, and widths of the needed variables from the codebook and pass them to read_ascii's var_names, var_positions, and var_widths arguments. Multicard datasets are more complicated. In the best case, the file contains one line per card per respondent; then, the user can extract the needed variables by adding only the var_cards and total_cards arguments. When this condition is violated---there is not a line for every card for every respondent, or there are extra lines---the function will throw an error and request the user specify the additional arguments card_pattern and respondent_pattern.

Examples