R/cces_std-for-acs.R
ccc_std_demographics.Rd
Recode CCES variables so that they merge to ACS variables
ccc_std_demographics(
tbl,
only_demog = FALSE,
age_key = deframe(ccesMRPprep::age5_key),
wh_as_hisp = TRUE,
bh_as_hisp = TRUE
)
The cumulative common content. It can be any subset but must include variables
age
, race
, educ
, gender
, st
, state
,
and cd
. Factor variables must a haven_labelled class variable as is
the output of get_cces_dataverse("cumulative")
. See ccc_samp for an example.
Any other file (for example, year-specific common contents) are not compatible with
this function.
Drop variables besides demographics? Defaults to FALSE
The vector key to use to bin age. Can be deframe(age5_key)
or deframe(age10_key)
Should people who identify as both White and Hispanic be
coded as "Hispanic", thereby leaving all remaining "Whites" as Non-Hispanic Whites
by definition? Could be NULL
if you know the column hispanic
is not in the
data. For more information, see https://bit.ly/3hZ6mz4.
Same as wh_as_hisp
but for Black Hispanics. Defaults to TRUE.
The output is of the same dimensions as the input (unless only_demog = TRUE
)
but with the following exceptions:
age
is coded to match up with the ACS bins and the recoding occurs
in a separate function, ccc_bin_age
. The unbinned age is left instead to
age_orig
.
educ
is coarsened and relabelled with 4 categories to match up with the ACS.
(the original version is left as educ_cces_chr
). Recoding is governed by
the key-value pairs educ_key.
educ_3
is further coarsened to 3 categories, grouping together a BA
and a higher degree into one category. This is necessary for some ACS tables
that do not make the distinction. Make sure to decide which type of education
variable to use beforehand after looking at the ACS codes
the same goes for race
. These recodings are governed by the
key-value pair race_key.
cd
is standardized so that at large districts are given "01" and
single-digit districts are padded with 0s. e.g. "WY-01"
and "CA-02"
.
This function requires data to have the following columns:
A string column called st
that is a two-letter abbreviation of the state, or a labelled
variable coercible to a string.
A string column called cd
that has the congressional district that is of the form
"WY-01"
, OR a numeric column called dist
that has the numeric district number.
cd_up
can also be used for the district in the upcoming election.
A <numeric+labelled> column called educ
for education, race
for race,
age
for age, and gender
for gender, with values following
the cumulative content.
library(dplyr)
ccc_std_demographics(ccc_samp)
#> age variable modified to bins. Original age variable is now in age_orig.
#> # A tibble: 1,000 × 23
#> year case_id state st cd marstat gender female age age_orig
#> <dbl> <chr> <chr> <chr> <chr> <dbl+l> <dbl+l> <dbl> <int+l> <dbl>
#> 1 2006 1005058 Michigan MI MI-04 1 [Mar… 2 [Fem… 1 3 [35 … 36
#> 2 2006 1006614 Texas TX TX-18 1 [Mar… 1 [Mal… 0 3 [35 … 40
#> 3 2006 1009338 California CA CA-48 1 [Mar… 1 [Mal… 0 2 [25 … 32
#> 4 2006 1088898 Florida FL FL-13 1 [Mar… 2 [Fem… 1 4 [45 … 52
#> 5 2006 1090564 Pennsylvan… PA PA-19 5 [Sin… 2 [Fem… 1 2 [25 … 25
#> 6 2006 1093132 South Caro… SC SC-02 1 [Mar… 2 [Fem… 1 4 [45 … 48
#> 7 2006 1093573 Utah UT UT-03 4 [Wid… 2 [Fem… 1 5 [65 … 74
#> 8 2006 1105620 Hawaii HI HI-01 5 [Sin… 1 [Mal… 0 3 [35 … 37
#> 9 2006 1116569 Texas TX TX-21 5 [Sin… 2 [Fem… 1 1 [18 … 20
#> 10 2006 1117377 Ohio OH OH-07 3 [Div… 2 [Fem… 1 4 [45 … 49
#> # ℹ 990 more rows
#> # ℹ 13 more variables: educ <dbl+lbl>, educ_cces_chr <chr>, educ_3 <dbl+lbl>,
#> # race_cces_chr <chr>, race <int+lbl>, faminc <dbl+lbl>,
#> # vv_turnout_gvm <dbl+lbl>, zipcode <chr>, county_fips <chr>,
#> # hispanic <dbl+lbl>, newsint <dbl+lbl>, voted_pres_16 <dbl+lbl>,
#> # economy_retro <dbl+lbl>
ccc_std_demographics(ccc_samp, wh_as_hisp = FALSE) %>% count(race)
#> age variable modified to bins. Original age variable is now in age_orig.
#> # A tibble: 6 × 2
#> race n
#> <int+lbl> <int>
#> 1 1 [White] 732
#> 2 2 [Black] 108
#> 3 3 [Hispanic] 81
#> 4 4 [Asian] 27
#> 5 5 [Native American] 11
#> 6 6 [All Other] 41
ccc_std_demographics(ccc_samp, bh_as_hisp = FALSE, wh_as_hisp = FALSE) %>% count(race)
#> age variable modified to bins. Original age variable is now in age_orig.
#> # A tibble: 6 × 2
#> race n
#> <int+lbl> <int>
#> 1 1 [White] 732
#> 2 2 [Black] 110
#> 3 3 [Hispanic] 79
#> 4 4 [Asian] 27
#> 5 5 [Native American] 11
#> 6 6 [All Other] 41
if (FALSE) { # \dontrun{
# For full data (takes a while)
library(dataverse)
cumulative_rds <- get_cces_dataverse("cumulative")
cumulative_std <- ccc_std_demographics(cumulative_rds)
} # }
if (FALSE) { # \dontrun{
wrong_cd_fmt <- mutate(ccc_samp, cd = str_replace_all(cd, "01", "1"))
wrong_cd_fmt %>% filter(st == "HI") %>% count(cd)
# throws error because CD is formatted the wrong way
ccc_std_demographics(wrong_cd_fmt)
} # }