Recode CCES variables so that they merge to ACS variables

ccc_std_demographics(
  tbl,
  only_demog = FALSE,
  age_key = deframe(ccesMRPprep::age5_key),
  wh_as_hisp = TRUE,
  bh_as_hisp = TRUE
)

Arguments

tbl

The cumulative common content. It can be any subset but must include variables age, race, educ, gender, st, state, and cd. Factor variables must a haven_labelled class variable as is the output of get_cces_dataverse("cumulative"). See ccc_samp for an example. Any other file (for example, year-specific common contents) are not compatible with this function.

only_demog

Drop variables besides demographics? Defaults to FALSE

age_key

The vector key to use to bin age. Can be deframe(age5_key) or deframe(age10_key)

wh_as_hisp

Should people who identify as both White and Hispanic be coded as "Hispanic", thereby leaving all remaining "Whites" as Non-Hispanic Whites by definition? Could be NULL if you know the column hispanic is not in the data. For more information, see https://bit.ly/3hZ6mz4.

bh_as_hisp

Same as wh_as_hisp but for Black Hispanics. Defaults to TRUE.

Value

The output is of the same dimensions as the input (unless only_demog = TRUE) but with the following exceptions:

  • age is coded to match up with the ACS bins and the recoding occurs in a separate function, ccc_bin_age. The unbinned age is left instead to age_orig.

  • educ is coarsened and relabelled with 4 categories to match up with the ACS. (the original version is left as educ_cces_chr). Recoding is governed by the key-value pairs educ_key.

  • educ_3 is further coarsened to 3 categories, grouping together a BA and a higher degree into one category. This is necessary for some ACS tables that do not make the distinction. Make sure to decide which type of education variable to use beforehand after looking at the ACS codes

  • the same goes for race. These recodings are governed by the key-value pair race_key.

  • cd is standardized so that at large districts are given "01" and single-digit districts are padded with 0s. e.g. "WY-01" and "CA-02".

Input Requirements

This function requires data to have the following columns:

  • A string column called st that is a two-letter abbreviation of the state, or a labelled variable coercible to a string.

  • A string column called cd that has the congressional district that is of the form "WY-01", OR a numeric column called dist that has the numeric district number. cd_up can also be used for the district in the upcoming election.

  • A <numeric+labelled> column called educ for education, race for race, age for age, and gender for gender, with values following the cumulative content.

Examples


library(dplyr)

 ccc_std_demographics(ccc_samp)
#> age variable modified to bins. Original age variable is now in age_orig. 
#> # A tibble: 1,000 × 23
#>     year case_id state       st    cd    marstat gender  female age     age_orig
#>    <dbl> <chr>   <chr>       <chr> <chr> <dbl+l> <dbl+l>  <dbl> <int+l>    <dbl>
#>  1  2006 1005058 Michigan    MI    MI-04 1 [Mar… 2 [Fem…      1 3 [35 …       36
#>  2  2006 1006614 Texas       TX    TX-18 1 [Mar… 1 [Mal…      0 3 [35 …       40
#>  3  2006 1009338 California  CA    CA-48 1 [Mar… 1 [Mal…      0 2 [25 …       32
#>  4  2006 1088898 Florida     FL    FL-13 1 [Mar… 2 [Fem…      1 4 [45 …       52
#>  5  2006 1090564 Pennsylvan… PA    PA-19 5 [Sin… 2 [Fem…      1 2 [25 …       25
#>  6  2006 1093132 South Caro… SC    SC-02 1 [Mar… 2 [Fem…      1 4 [45 …       48
#>  7  2006 1093573 Utah        UT    UT-03 4 [Wid… 2 [Fem…      1 5 [65 …       74
#>  8  2006 1105620 Hawaii      HI    HI-01 5 [Sin… 1 [Mal…      0 3 [35 …       37
#>  9  2006 1116569 Texas       TX    TX-21 5 [Sin… 2 [Fem…      1 1 [18 …       20
#> 10  2006 1117377 Ohio        OH    OH-07 3 [Div… 2 [Fem…      1 4 [45 …       49
#> # ℹ 990 more rows
#> # ℹ 13 more variables: educ <dbl+lbl>, educ_cces_chr <chr>, educ_3 <dbl+lbl>,
#> #   race_cces_chr <chr>, race <int+lbl>, faminc <dbl+lbl>,
#> #   vv_turnout_gvm <dbl+lbl>, zipcode <chr>, county_fips <chr>,
#> #   hispanic <dbl+lbl>, newsint <dbl+lbl>, voted_pres_16 <dbl+lbl>,
#> #   economy_retro <dbl+lbl>
 ccc_std_demographics(ccc_samp, wh_as_hisp = FALSE) %>% count(race)
#> age variable modified to bins. Original age variable is now in age_orig. 
#> # A tibble: 6 × 2
#>   race                    n
#>   <int+lbl>           <int>
#> 1 1 [White]             732
#> 2 2 [Black]             108
#> 3 3 [Hispanic]           81
#> 4 4 [Asian]              27
#> 5 5 [Native American]    11
#> 6 6 [All Other]          41
 ccc_std_demographics(ccc_samp, bh_as_hisp = FALSE, wh_as_hisp = FALSE) %>% count(race)
#> age variable modified to bins. Original age variable is now in age_orig. 
#> # A tibble: 6 × 2
#>   race                    n
#>   <int+lbl>           <int>
#> 1 1 [White]             732
#> 2 2 [Black]             110
#> 3 3 [Hispanic]           79
#> 4 4 [Asian]              27
#> 5 5 [Native American]    11
#> 6 6 [All Other]          41

if (FALSE) {
 # For full data (takes a while)
 library(dataverse)
 cumulative_rds <- get_cces_dataverse("cumulative")
 cumulative_std <- ccc_std_demographics(cumulative_rds)
 }

if (FALSE) {
 wrong_cd_fmt <- mutate(ccc_samp, cd = str_replace_all(cd, "01", "1"))
 wrong_cd_fmt %>% filter(st == "HI") %>% count(cd)

 # throws error because CD is formatted the wrong way
 ccc_std_demographics(wrong_cd_fmt)
}