Recode CCES variables so that they merge to ACS variables

ccc_std_demographics(
  tbl,
  only_demog = FALSE,
  age_key = deframe(ccesMRPprep::age5_key)
)

Arguments

tbl

The cumulative common content. It can be any subset but must include variables age, race, educ, gender, st, state, and cd. Factor variables must a haven_labelled class variable as is the output of get_cces_dataverse("cumulative"). See ccc_samp for an example.

only_demog

Drop variables besides demographics? Defaults to FALSE

age_key

The vector key to use to bin age. Can be deframe(age5_key) or deframe(age10_key)

Value

The output is of the same dimensions as the input (unless only_demog = TRUE) but with the following exceptions:

  • age is coded to match up with the ACS bins and the recoding occurs in a separate function, ccc_bin_age. The unbinned age is left instead to age_orig.

  • educ is recoded (coarsened and relabelled) to match up with the ACS. (the original version is left as educ_cces_chr). Recoding is governed by the key-value pairs educ_key

  • the same goes for race. These recodings are governed by the key-value pair race_key.

  • cd is standardized so that at large districts are given "01" and single-digit districts are padded with 0s. e.g. "WY-01" and "CA-02".

Input Requirements

This function requires data to have the following columns:

  • A string column called st that is a two-letter abbreviation of the state, or a labelled variable coercible to a string.

  • A string column called cd that has the congressional district that is of the form "WY-01", OR a numeric column called dist that has the numeric district number. cd_up can also be used for the district in the upcoming election.

  • A <numeric+labelled> column called educ for education, race for race, age for age, and gender for gender, with values following the cumulative content.

Examples

ccc_std_demographics(ccc_samp)
#> age variable modified to bins. Original age variable is now in age_orig.
#> # A tibble: 1,000 x 22 #> year case_id state st cd marstat gender female age age_orig #> <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <int+lbl> <dbl> #> 1 2006 1005058 Michigan MI MI-04 1 2 1 3 [35 to 4… 36 #> 2 2006 1006614 Texas TX TX-18 1 1 0 3 [35 to 4… 40 #> 3 2006 1009338 Califor… CA CA-48 1 1 0 2 [25 to 3… 32 #> 4 2006 1088898 Florida FL FL-13 1 2 1 4 [45 to 6… 52 #> 5 2006 1090564 Pennsyl… PA PA-19 5 2 1 2 [25 to 3… 25 #> 6 2006 1093132 South C… SC SC-02 1 2 1 4 [45 to 6… 48 #> 7 2006 1093573 Utah UT UT-03 4 2 1 5 [65 year… 74 #> 8 2006 1105620 Hawaii HI HI-01 5 1 0 3 [35 to 4… 37 #> 9 2006 1116569 Texas TX TX-21 5 2 1 1 [18 to 2… 20 #> 10 2006 1117377 Ohio OH OH-07 3 2 1 4 [45 to 6… 49 #> # … with 990 more rows, and 12 more variables: educ_cces_chr <chr>, #> # educ <dbl+lbl>, race_cces_chr <chr>, race <int+lbl>, faminc <dbl>, #> # vv_turnout_gvm <dbl>, zipcode <chr>, county_fips <chr>, hispanic <dbl>, #> # newsint <dbl>, voted_pres_16 <dbl>, economy_retro <dbl>
if (FALSE) { # For full data (takes a while) library(dataverse) cumulative_rds <- get_cces_dataverse("cumulative") cumulative_std <- ccc_std_demographics(cumulative_rds) } if (FALSE) { wrong_cd_fmt <- mutate(ccc_samp, cd = str_replace_all(cd, "01", "1")) wrong_cd_fmt %>% filter(st == "HI") %>% count(cd) # throws error because CD is formatted the wrong way ccc_std_demographics(wrong_cd_fmt) }