R/get_dataverse.R
get_cces_dataverse.RdWrapper function to get CCES/CES data from dataverse into the current R environment using the dataverse package.
get_cces_dataverse(
name = "cumulative",
year_subset = NULL,
std_index = TRUE,
ver = ":latest",
cache = TRUE,
dataverse_paths = ccesMRPprep::cces_dv_ids
)The name of the dataset as defined in data(cces_dv_ids). e.g. "cumulative" or "2018".
The year (or years, a vector) to subset too. If name is a year
specific dataset, this argument is redundant, but if name == "cumulative", then
the output will be the cumulative dataset subsetted to that year. This is useful
when using the cumulative dataset for its harmonized variables.
Whether to standardize the unique case identifier. These
have different column names in different datasets, but setting this to TRUE
(the default) will all rename them "case_id" and also add the year of the dataset.
This way, every dataset that gets downloaded will have the unique identifier
defined by the variables c("year", "case_id").
Version of the Dataverse dataset to extract. Use ":latest"
for the latest released version, or a concrete version such as "9.0"
for reproducibility. ":draft" is not supported.
Logical, whether to cache downloaded files on disk. The default,
TRUE, resolves ":latest" to the current released version number
before downloading, which allows the dataverse package to use its disk
cache. Set to FALSE to re-download.
A dataframe where one row represents metadata for one CCES dataset. Built-in data cces_dv_ids is used as a default and should not be changed.
This function is a simple wrapper around the dataverse pacakge on CRAN.
It downloads the dataset from the dataverse, and loads it into a tibble with the appropriate
file data type. Using get_cces_question does some standardization across years, for example,
the name of the case ID variable, so that it makes downstream.
You may be interested in customizing your download following https://cran.r-project.org/web/packages/dataverse/vignettes/C-download.html,
or downloading the feather version of the CCES cumulative, which reads much
faster than the default .dta file in this function. To clear the Dataverse
disk cache, use dataverse::cache_reset().
# Read cumulative common content, subsetted to 2018. By default, this uses
# the latest released Dataverse version and keeps a local copy for next time.
if (FALSE) { # \dontrun{
ccc <- get_cces_dataverse("cumulative", year_subset = 2018)
} # }
# The default resolves to the latest released Dataverse version. For 2018,
# version 6.0 and version 4.0 are different raw files on Dataverse.
if (FALSE) { # \dontrun{
cc18 <- get_cces_dataverse("2018")
#> i Using version "6.0" of "10.7910/DVN/ZSBZ7K" (no existing cache on disk).
#> Downloading large dataset, can take a few minutes to complete.
# The same call uses the same versioned cache entry.
cc18_again <- get_cces_dataverse("2018")
#> i Using version "6.0" of "10.7910/DVN/ZSBZ7K" (using existing disk cache).
# Version 4.0 resolves to a different file, so it is cached separately.
cc18_v4 <- get_cces_dataverse("2018", ver = "4.0")
#> i Using version "4.0" of "10.7910/DVN/ZSBZ7K" (no existing cache on disk).
#> Downloading large dataset, can take a few minutes to complete.
} # }
# Example code to read and write a series of common content datasets
# in a directory "data/input/cces/
if (FALSE) { # \dontrun{
dir_create("data/cces")
for (d in c("cumulative", "2018")) {
if (file_exists(glue("data/input/cces/cces_{d}.rds")))
next
write_rds(get_cces_dataverse(d), glue("data/input/cces/cces_{d}.rds")) # takes a few minutes
}
} # }