Currently only is compatible with question of type "yesno"
.
build_counts(
formula,
data,
keep_vars = NULL,
name_ones_as = "yes",
name_trls_as = "n_response",
multiple_qIDs = FALSE,
verbose = TRUE
)
the model formula used to fit the multilevel regression model.
Should be of the form y ~ x1 + x2 + (1|x3)
where y is a binary variable
and only categorical variables should be used in the random effects notation.
A cleaned CCES dataset, e.g. from ccc_std_demographics which is
then combined with outcome and contextual data in cces_join_slim.
By default it expects the LHS outcome to be named response
, and expects
the dataset to have that variable.
This variable must be binary or it must be a character vector that can be coerced
by yesno_to_binary into a binary variable.
Variables that will be kept as a cell variable, regardless of whether it is specified in a formula. Input as character vector.
What to name the variable that represents the number of successes in the binomial
What to name the variable that represents the number of trials in the binomial.
Does the data contain multiple outcomes in long form and
therefore require the counts to be built for each outcome? Defaults to FALSE
.
Show warning messages? Defaults to TRUE
A dataframe of cells. The following variables have fixed names and
will be assumed by ccesMRPrun::fit_brms_binomial
:
yes
: the number of successes observed in the cell
n_response
the number of non-missing responses, representing the number
of trials.
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
ccc_samp_std <- ccc_samp %>%
mutate(y = sample(c("For", "Against"), size = n(), replace = TRUE)) %>%
ccc_std_demographics()
#> age variable modified to bins. Original age variable is now in age_orig.
ccc_samp_out <- build_counts(y ~ age + gender + educ + (1|cd),
ccc_samp_std)
ccc_samp_out
#> # A tibble: 945 × 6
#> age gender educ cd n_response yes
#> <fct> <fct> <fct> <chr> <int> <dbl>
#> 1 18 to 24 years Male HS or Less CA-39 1 1
#> 2 18 to 24 years Male HS or Less FL-25 1 0
#> 3 18 to 24 years Male HS or Less GA-10 1 1
#> 4 18 to 24 years Male HS or Less IL-07 1 0
#> 5 18 to 24 years Male HS or Less LA-01 1 0
#> 6 18 to 24 years Male HS or Less MA-05 1 1
#> 7 18 to 24 years Male HS or Less NC-09 1 1
#> 8 18 to 24 years Male HS or Less NC-11 1 1
#> 9 18 to 24 years Male HS or Less PA-11 1 1
#> 10 18 to 24 years Male Some College IA-02 1 1
#> # ℹ 935 more rows
# alternative options
build_counts(y ~ educ + (1|cd), ccc_samp_std,
name_ones_as = "success", name_trls_as = "trials")
#> # A tibble: 716 × 4
#> educ cd trials success
#> <fct> <chr> <int> <dbl>
#> 1 HS or Less AK-01 1 0
#> 2 HS or Less AL-02 3 2
#> 3 HS or Less AL-03 2 0
#> 4 HS or Less AL-05 1 0
#> 5 HS or Less AR-02 2 1
#> 6 HS or Less AR-03 2 2
#> 7 HS or Less AZ-04 1 1
#> 8 HS or Less AZ-06 1 0
#> 9 HS or Less AZ-08 1 0
#> 10 HS or Less CA-06 2 1
#> # ℹ 706 more rows
build_counts(y ~ educ + (1|cd), ccc_samp_std,
keep_vars = "state")
#> # A tibble: 716 × 5
#> educ cd state n_response yes
#> <fct> <chr> <chr> <int> <dbl>
#> 1 HS or Less AK-01 Alaska 1 0
#> 2 HS or Less AL-02 Alabama 3 2
#> 3 HS or Less AL-03 Alabama 2 0
#> 4 HS or Less AL-05 Alabama 1 0
#> 5 HS or Less AR-02 Arkansas 2 1
#> 6 HS or Less AR-03 Arkansas 2 2
#> 7 HS or Less AZ-04 Arizona 1 1
#> 8 HS or Less AZ-06 Arizona 1 0
#> 9 HS or Less AZ-08 Arizona 1 0
#> 10 HS or Less CA-06 California 2 1
#> # ℹ 706 more rows