Recode CCES variables so that they merge to ACS variables

ccc_std_demographics(
  tbl,
  only_demog = FALSE,
  age_key = deframe(ccesMRPprep::age5_key),
  wh_as_hisp = TRUE,
  bh_as_hisp = TRUE
)

Arguments

tbl

The cumulative common content. It can be any subset but must include variables age, race, educ, gender, st, state, and cd. Factor variables must a haven_labelled class variable as is the output of get_cces_dataverse("cumulative"). See ccc_samp for an example. Any other file (for example, year-specific common contents) are not compatible with this function.

only_demog

Drop variables besides demographics? Defaults to FALSE

age_key

The vector key to use to bin age. Can be deframe(age5_key) or deframe(age10_key)

wh_as_hisp

Should people who identify as both White and Hispanic be coded as "Hispanic", thereby leaving all remaining "Whites" as Non-Hispanic Whites by definition? Could be NULL if you know the column hispanic is not in the data. For more information, see https://bit.ly/3hZ6mz4.

bh_as_hisp

Same as wh_as_hisp but for Black Hispanics. Defaults to TRUE.

Value

The output is of the same dimensions as the input (unless only_demog = TRUE) but with the following exceptions:

  • age is coded to match up with the ACS bins and the recoding occurs in a separate function, ccc_bin_age. The unbinned age is left instead to age_orig.

  • educ is recoded (coarsened and relabelled) to match up with the ACS. (the original version is left as educ_cces_chr). Recoding is governed by the key-value pairs educ_key

  • the same goes for race. These recodings are governed by the key-value pair race_key.

  • cd is standardized so that at large districts are given "01" and single-digit districts are padded with 0s. e.g. "WY-01" and "CA-02".

Input Requirements

This function requires data to have the following columns:

  • A string column called st that is a two-letter abbreviation of the state, or a labelled variable coercible to a string.

  • A string column called cd that has the congressional district that is of the form "WY-01", OR a numeric column called dist that has the numeric district number. cd_up can also be used for the district in the upcoming election.

  • A <numeric+labelled> column called educ for education, race for race, age for age, and gender for gender, with values following the cumulative content.

Examples

library(dplyr) ccc_std_demographics(ccc_samp)
#> age variable modified to bins. Original age variable is now in age_orig.
#> # A tibble: 1,000 × 22 #> year case_id state st cd marstat gender female age age_orig #> <dbl> <chr> <chr> <chr> <chr> <dbl+l> <dbl+l> <dbl> <int+l> <dbl> #> 1 2006 1005058 Michigan MI MI-04 1 [Mar… 2 [Fem… 1 3 [35 … 36 #> 2 2006 1006614 Texas TX TX-18 1 [Mar… 1 [Mal… 0 3 [35 … 40 #> 3 2006 1009338 California CA CA-48 1 [Mar… 1 [Mal… 0 2 [25 … 32 #> 4 2006 1088898 Florida FL FL-13 1 [Mar… 2 [Fem… 1 4 [45 … 52 #> 5 2006 1090564 Pennsylvania PA PA-19 5 [Sin… 2 [Fem… 1 2 [25 … 25 #> 6 2006 1093132 South Carolina SC SC-02 1 [Mar… 2 [Fem… 1 4 [45 … 48 #> 7 2006 1093573 Utah UT UT-03 4 [Wid… 2 [Fem… 1 5 [65 … 74 #> 8 2006 1105620 Hawaii HI HI-01 5 [Sin… 1 [Mal… 0 3 [35 … 37 #> 9 2006 1116569 Texas TX TX-21 5 [Sin… 2 [Fem… 1 1 [18 … 20 #> 10 2006 1117377 Ohio OH OH-07 3 [Div… 2 [Fem… 1 4 [45 … 49 #> # … with 990 more rows, and 12 more variables: educ_cces_chr <chr>, #> # educ <dbl+lbl>, race_cces_chr <chr>, race <int+lbl>, faminc <dbl+lbl>, #> # vv_turnout_gvm <dbl+lbl>, zipcode <chr>, county_fips <chr>, #> # hispanic <dbl+lbl>, newsint <dbl+lbl>, voted_pres_16 <dbl+lbl>, #> # economy_retro <dbl+lbl>
ccc_std_demographics(ccc_samp, wh_as_hisp = FALSE) %>% count(race)
#> age variable modified to bins. Original age variable is now in age_orig.
#> # A tibble: 6 × 2 #> race n #> <int+lbl> <int> #> 1 1 [White] 732 #> 2 2 [Black] 108 #> 3 3 [Hispanic] 81 #> 4 4 [Asian] 27 #> 5 5 [Native American] 11 #> 6 6 [All Other] 41
ccc_std_demographics(ccc_samp, bh_as_hisp = FALSE, wh_as_hisp = FALSE) %>% count(race)
#> age variable modified to bins. Original age variable is now in age_orig.
#> # A tibble: 6 × 2 #> race n #> <int+lbl> <int> #> 1 1 [White] 732 #> 2 2 [Black] 110 #> 3 3 [Hispanic] 79 #> 4 4 [Asian] 27 #> 5 5 [Native American] 11 #> 6 6 [All Other] 41
if (FALSE) { # For full data (takes a while) library(dataverse) cumulative_rds <- get_cces_dataverse("cumulative") cumulative_std <- ccc_std_demographics(cumulative_rds) } if (FALSE) { wrong_cd_fmt <- mutate(ccc_samp, cd = str_replace_all(cd, "01", "1")) wrong_cd_fmt %>% filter(st == "HI") %>% count(cd) # throws error because CD is formatted the wrong way ccc_std_demographics(wrong_cd_fmt) }