Recode CCES variables so that they merge to ACS variables

ccc_std_demographics(
  tbl,
  only_demog = FALSE,
  age_key = deframe(ccesMRPprep::age5_key),
  wh_as_hisp = TRUE,
  bh_as_hisp = TRUE
)

Arguments

tbl

The cumulative common content. It can be any subset but must include variables age, race, educ, gender, st, state, and cd. Factor variables must a haven_labelled class variable as is the output of get_cces_dataverse("cumulative"). See ccc_samp for an example. Any other file (for example, year-specific common contents) are not compatible with this function.

only_demog

Drop variables besides demographics? Defaults to FALSE

age_key

The vector key to use to bin age. Can be deframe(age5_key) or deframe(age10_key)

wh_as_hisp

Should people who identify as both White and Hispanic be coded as "Hispanic", thereby leaving all remaining "Whites" as Non-Hispanic Whites by definition? Could be NULL if you know the column hispanic is not in the data. For more information, see https://bit.ly/3hZ6mz4.

bh_as_hisp

Same as wh_as_hisp but for Black Hispanics. Defaults to TRUE.

Value

The output is of the same dimensions as the input (unless only_demog = TRUE) but with the following exceptions:

  • age is coded to match up with the ACS bins and the recoding occurs in a separate function, ccc_bin_age. The unbinned age is left instead to age_orig.

  • educ is recoded (coarsened and relabelled) to match up with the ACS. (the original version is left as educ_cces_chr). Recoding is governed by the key-value pairs educ_key

  • the same goes for race. These recodings are governed by the key-value pair race_key.

  • cd is standardized so that at large districts are given "01" and single-digit districts are padded with 0s. e.g. "WY-01" and "CA-02".

Input Requirements

This function requires data to have the following columns:

  • A string column called st that is a two-letter abbreviation of the state, or a labelled variable coercible to a string.

  • A string column called cd that has the congressional district that is of the form "WY-01", OR a numeric column called dist that has the numeric district number. cd_up can also be used for the district in the upcoming election.

  • A <numeric+labelled> column called educ for education, race for race, age for age, and gender for gender, with values following the cumulative content.

Examples


library(dplyr)

 ccc_std_demographics(ccc_samp)
#> age variable modified to bins. Original age variable is now in age_orig. 
#> # A tibble: 1,000 × 22
#>     year case_id state       st    cd    marstat  gender female     age age_orig
#>    <dbl> <chr>   <chr>       <chr> <chr> <dbl+l> <dbl+l>  <dbl> <int+l>    <dbl>
#>  1  2006 1005058 Michigan    MI    MI-04 1 [Mar… 2 [Fem…      1 3 [35 …       36
#>  2  2006 1006614 Texas       TX    TX-18 1 [Mar… 1 [Mal…      0 3 [35 …       40
#>  3  2006 1009338 California  CA    CA-48 1 [Mar… 1 [Mal…      0 2 [25 …       32
#>  4  2006 1088898 Florida     FL    FL-13 1 [Mar… 2 [Fem…      1 4 [45 …       52
#>  5  2006 1090564 Pennsylvan… PA    PA-19 5 [Sin… 2 [Fem…      1 2 [25 …       25
#>  6  2006 1093132 South Caro… SC    SC-02 1 [Mar… 2 [Fem…      1 4 [45 …       48
#>  7  2006 1093573 Utah        UT    UT-03 4 [Wid… 2 [Fem…      1 5 [65 …       74
#>  8  2006 1105620 Hawaii      HI    HI-01 5 [Sin… 1 [Mal…      0 3 [35 …       37
#>  9  2006 1116569 Texas       TX    TX-21 5 [Sin… 2 [Fem…      1 1 [18 …       20
#> 10  2006 1117377 Ohio        OH    OH-07 3 [Div… 2 [Fem…      1 4 [45 …       49
#> # … with 990 more rows, and 12 more variables: educ_cces_chr <chr>,
#> #   educ <dbl+lbl>, race_cces_chr <chr>, race <int+lbl>, faminc <dbl+lbl>,
#> #   vv_turnout_gvm <dbl+lbl>, zipcode <chr>, county_fips <chr>,
#> #   hispanic <dbl+lbl>, newsint <dbl+lbl>, voted_pres_16 <dbl+lbl>,
#> #   economy_retro <dbl+lbl>
 ccc_std_demographics(ccc_samp, wh_as_hisp = FALSE) %>% count(race)
#> age variable modified to bins. Original age variable is now in age_orig. 
#> # A tibble: 6 × 2
#>                  race     n
#>             <int+lbl> <int>
#> 1 1 [White]             732
#> 2 2 [Black]             108
#> 3 3 [Hispanic]           81
#> 4 4 [Asian]              27
#> 5 5 [Native American]    11
#> 6 6 [All Other]          41
 ccc_std_demographics(ccc_samp, bh_as_hisp = FALSE, wh_as_hisp = FALSE) %>% count(race)
#> age variable modified to bins. Original age variable is now in age_orig. 
#> # A tibble: 6 × 2
#>                  race     n
#>             <int+lbl> <int>
#> 1 1 [White]             732
#> 2 2 [Black]             110
#> 3 3 [Hispanic]           79
#> 4 4 [Asian]              27
#> 5 5 [Native American]    11
#> 6 6 [All Other]          41

if (FALSE) {
 # For full data (takes a while)
 library(dataverse)
 cumulative_rds <- get_cces_dataverse("cumulative")
 cumulative_std <- ccc_std_demographics(cumulative_rds)
 }

if (FALSE) {
 wrong_cd_fmt <- mutate(ccc_samp, cd = str_replace_all(cd, "01", "1"))
 wrong_cd_fmt %>% filter(st == "HI") %>% count(cd)

 # throws error because CD is formatted the wrong way
 ccc_std_demographics(wrong_cd_fmt)
}