R/take-draws_poststrat.R
poststrat_draws.Rd
Get MCMC draws of post-stratified estimate of demog x cd cells
poststrat_draws(
model,
poststrat_tgt,
orig_data = NULL,
question_lbl = attr(orig_data, "question"),
area_var = "cd",
count_var = "count",
calibrate = FALSE,
calib_area_to = NULL,
calib_to_var = NULL,
calib_join_var = NULL,
dtplyr = FALSE,
new_levels = FALSE
)
stan model from fit_brms
The poststratification target. It must contain the column
count
, which is treated as the number of trials
in the binomial model.
original survey data. This defaults to NULL but if supplied be used to (1) subset the poststratification, to areas only in the survey, and (2) label the question outcome.
A character string that indicates the outcome, e.g. a shorthand for the outcome variable. This is useful when you want to preserve the outcome or description of multiple models.
A character string for the variable name(s) for area to group
and aggregate by. That is, the area of interest in MRP. Defaults to "cd"
A character string for the variable name for the population
count in the poststrat_tgt
dataframe. This will be renamed as if it is
a trial count in the model. Defaults to "count"
.
Adjust each cell's posthoc estimates so they add up to
a pre-specified, user input? Logical, defaulting to FALSE. See the calib_area_to
argument.
A dataset with area-level correct values to calibrate to in the last
column. It should contain the variables set in calib_join_var
and calib_to_var
.
See posthoc_error()
for details.
The variable to calibrate to, e.g. the voteshare
The variable that defines the level of the calibration dataframe that can be joined, e.g. the area
Whether to use a data.table/dtplyr backend for processing for slightly faster dataframe wrangling. Currently does not apply to anything within the function.
If there are new levels in the poststrat table that do not have coefficients in the survey data, should there be an extrapolation or assignment to 0s? The answer should almost always be No in MRP.
A tidy dataset with qID
x cd
x iter
number of rows,
where qID
is the number of questions (outcomes), cd
is
the number of geographies, and iter
is the number of iterations estimated in
the MCMC model. The demographic cells within a district are averaged across,
and a MRP estimate is computed.
It contains the columns
The number of iterations
The geography
The question
The proportion of success, estimated by MRP.
class(fit_GA) # brms object
#> [1] "brmsfit"
head(acs_GA) # dataset
#> # A tibble: 6 × 8
#> year cd female educ age count clinton_vote clinton_vote_2p…
#> <dbl> <chr> <int> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 2016 GA-01 0 HS or Less 18 to 24 … 1082 0.409 0.420
#> 2 2016 GA-01 0 HS or Less 18 to 24 … 6603 0.409 0.420
#> 3 2016 GA-01 0 HS or Less 18 to 24 … 16412 0.409 0.420
#> 4 2016 GA-01 0 Some College 18 to 24 … 14732 0.409 0.420
#> 5 2016 GA-01 0 Some College 18 to 24 … 1214 0.409 0.420
#> 6 2016 GA-01 0 4-Year 18 to 24 … 1810 0.409 0.420
drw_GA <- poststrat_draws(fit_GA, poststrat_tgt = acs_GA, area_var = "cd")
drw_GA
#> # A tibble: 56,000 × 3
#> cd iter p_mrp
#> <chr> <dbl> <dbl>
#> 1 GA-01 1 0.496
#> 2 GA-01 2 0.484
#> 3 GA-01 3 0.469
#> 4 GA-01 4 0.447
#> 5 GA-01 5 0.414
#> 6 GA-01 6 0.490
#> 7 GA-01 7 0.429
#> 8 GA-01 8 0.491
#> 9 GA-01 9 0.427
#> 10 GA-01 10 0.435
#> # … with 55,990 more rows
if (FALSE) {
# 1. get MRP estimates by CD, while calibrating the overall cd results to
# the election
## Each takes about 75 secs
drw_GA_fix <- poststrat_draws(fit_GA, poststrat_tgt = acs_GA, calibrate = TRUE,
calib_area_to = elec_GA,
calib_join_var = "cd",
calib_to_var = "clinton_vote_2pty")
# to get MRP estimates by CD and sex, while calibrating the overall
# cd result to the eleciton
drw_GA_sex <- poststrat_draws(fit_GA, poststrat_tgt = acs_GA, calibrate = TRUE,
calib_area_to = select(elec_GA, cd, clinton_vote_2pty),
area_var = c("cd", "female"),
calib_join_var = "cd",
calib_to_var = "clinton_vote_2pty")
## take some examples
samp_ests <- drw_GA_sex %>% filter(cd == "GA-01", iter == 1:5) %>% arrange(iter)
## Gender balance in poststratification target is 48.7 - 51.3
sex_wt <- acs_GA %>%
filter(cd == "GA-01") %>%
count(cd, clinton_vote_2pty, female, wt = count) %>%
mutate(frac = n/sum(n))
## In all iterations, the MRP estimates should add up to the calibration target
samp_ests %>%
left_join(sex_wt, by = c("cd", "female")) %>%
group_by(cd, iter, clinton_vote_2pty) %>%
summarize(implied_vote = sum(p_mrp*frac) / sum(frac))
}