Wrapper function to get CCES/CES data from dataverse into the current R environment using the dataverse package.

get_cces_dataverse(
  name = "cumulative",
  year_subset = NULL,
  std_index = TRUE,
  ver = ":latest",
  cache = TRUE,
  dataverse_paths = ccesMRPprep::cces_dv_ids
)

Arguments

name

The name of the dataset as defined in data(cces_dv_ids). e.g. "cumulative" or "2018".

year_subset

The year (or years, a vector) to subset too. If name is a year specific dataset, this argument is redundant, but if name == "cumulative", then the output will be the cumulative dataset subsetted to that year. This is useful when using the cumulative dataset for its harmonized variables.

std_index

Whether to standardize the unique case identifier. These have different column names in different datasets, but setting this to TRUE (the default) will all rename them "case_id" and also add the year of the dataset. This way, every dataset that gets downloaded will have the unique identifier defined by the variables c("year", "case_id").

ver

Version of the Dataverse dataset to extract. Use ":latest" for the latest released version, or a concrete version such as "9.0" for reproducibility. ":draft" is not supported.

cache

Logical, whether to cache downloaded files on disk. The default, TRUE, resolves ":latest" to the current released version number before downloading, which allows the dataverse package to use its disk cache. Set to FALSE to re-download.

dataverse_paths

A dataframe where one row represents metadata for one CCES dataset. Built-in data cces_dv_ids is used as a default and should not be changed.

Details

This function is a simple wrapper around the dataverse pacakge on CRAN. It downloads the dataset from the dataverse, and loads it into a tibble with the appropriate file data type. Using get_cces_question does some standardization across years, for example, the name of the case ID variable, so that it makes downstream. You may be interested in customizing your download following https://cran.r-project.org/web/packages/dataverse/vignettes/C-download.html, or downloading the feather version of the CCES cumulative, which reads much faster than the default .dta file in this function. To clear the Dataverse disk cache, use dataverse::cache_reset().

Examples


# Read cumulative common content, subsetted to 2018. By default, this uses
# the latest released Dataverse version and keeps a local copy for next time.
if (FALSE) { # \dontrun{
 ccc <- get_cces_dataverse("cumulative", year_subset = 2018)
 } # }

# The default resolves to the latest released Dataverse version. For 2018,
# version 6.0 and version 4.0 are different raw files on Dataverse.
if (FALSE) { # \dontrun{
 cc18 <- get_cces_dataverse("2018")
 #> i Using version "6.0" of "10.7910/DVN/ZSBZ7K" (no existing cache on disk).
 #> Downloading large dataset, can take a few minutes to complete.

 # The same call uses the same versioned cache entry.
 cc18_again <- get_cces_dataverse("2018")
 #> i Using version "6.0" of "10.7910/DVN/ZSBZ7K" (using existing disk cache).

 # Version 4.0 resolves to a different file, so it is cached separately.
 cc18_v4 <- get_cces_dataverse("2018", ver = "4.0")
 #> i Using version "4.0" of "10.7910/DVN/ZSBZ7K" (no existing cache on disk).
 #> Downloading large dataset, can take a few minutes to complete.
} # }

# Example code to read and write a series of common content datasets
# in a directory "data/input/cces/
if (FALSE) { # \dontrun{
dir_create("data/cces")
for (d in c("cumulative", "2018")) {
if (file_exists(glue("data/input/cces/cces_{d}.rds")))
    next
  write_rds(get_cces_dataverse(d), glue("data/input/cces/cces_{d}.rds")) # takes a few minutes
}
} # }