Get MCMC draws of post-stratified estimate of demog x cd cells

poststrat_draws(
  model,
  poststrat_tgt,
  orig_data = NULL,
  question_lbl = attr(orig_data, "question"),
  area_var = "cd",
  count_var = "count",
  calibrate = FALSE,
  calib_area_to = NULL,
  calib_to_var = NULL,
  calib_join_var = NULL,
  dtplyr = FALSE,
  new_levels = FALSE
)

Arguments

model: stan model from fit_brms
poststrat_tgt: The poststratification target. It must contain the column count, which is treated as the number of trials in the binomial model.
orig_data: original survey data. This defaults to NULL but if supplied be used to (1) subset the poststratification, to areas only in the survey, and (2) label the question outcome.
question_lbl: A character string that indicates the outcome, e.g. a shorthand for the outcome variable. This is useful when you want to preserve the outcome or description of multiple models.
area_var: A character string for the variable name(s) for area to group and aggregate by. That is, the area of interest in MRP. Defaults to "cd"
count_var: A character string for the variable name for the population count in the poststrat_tgt dataframe. This will be renamed as if it is a trial count in the model. Defaults to "count".
calibrate: Adjust each cell's posthoc estimates so they add up to a pre-specified, user input? Logical, defaulting to FALSE. See the calib_area_to argument.
calib_area_to: A dataset with area-level correct values to calibrate to in the last column. It should contain the variables set in calib_join_var and calib_to_var. See posthoc_error() for details.
calib_to_var: The variable to calibrate to, e.g. the voteshare
calib_join_var: The variable that defines the level of the calibration dataframe that can be joined, e.g. the area
dtplyr: Whether to use a data.table/dtplyr backend for processing for slightly faster dataframe wrangling. Currently does not apply to anything within the function.
new_levels: If there are new levels in the poststrat table that do not have coefficients in the survey data, should there be an extrapolation or assignment to 0s? The answer should almost always be No in MRP.

Value

A tidy dataset with qID x cd x iter number of rows, where qID is the number of questions (outcomes), cd is the number of geographies, and iter is the number of iterations estimated in the MCMC model. The demographic cells within a district are averaged across, and a MRP estimate is computed. It contains the columns

iter: The number of iterations
cd: The geography
qID: The question
p_mrp_est: The proportion of success, estimated by MRP.

Examples

class(fit_GA) # brms object
#> [1] "brmsfit"
head(acs_GA) # dataset
#> # A tibble: 6 × 8
#>    year cd    female educ         age        count clinton_vote clinton_vote_2p…
#>   <dbl> <chr>  <int> <fct>        <fct>      <dbl>        <dbl>            <dbl>
#> 1  2016 GA-01      0 HS or Less   18 to 24 …  1082        0.409            0.420
#> 2  2016 GA-01      0 HS or Less   18 to 24 …  6603        0.409            0.420
#> 3  2016 GA-01      0 HS or Less   18 to 24 … 16412        0.409            0.420
#> 4  2016 GA-01      0 Some College 18 to 24 … 14732        0.409            0.420
#> 5  2016 GA-01      0 Some College 18 to 24 …  1214        0.409            0.420
#> 6  2016 GA-01      0 4-Year       18 to 24 …  1810        0.409            0.420

drw_GA <- poststrat_draws(fit_GA, poststrat_tgt = acs_GA, area_var = "cd")
drw_GA
#> # A tibble: 56,000 × 3
#>    cd     iter p_mrp
#>    <chr> <dbl> <dbl>
#>  1 GA-01     1 0.496
#>  2 GA-01     2 0.484
#>  3 GA-01     3 0.469
#>  4 GA-01     4 0.447
#>  5 GA-01     5 0.414
#>  6 GA-01     6 0.490
#>  7 GA-01     7 0.429
#>  8 GA-01     8 0.491
#>  9 GA-01     9 0.427
#> 10 GA-01    10 0.435
#> # … with 55,990 more rows

if (FALSE)  {

# 1. get MRP estimates by CD, while calibrating the overall cd results to
# the election
## Each takes about 75 secs
drw_GA_fix <- poststrat_draws(fit_GA, poststrat_tgt = acs_GA, calibrate = TRUE,
                              calib_area_to = elec_GA,
                              calib_join_var = "cd",
                              calib_to_var = "clinton_vote_2pty")

# to get MRP estimates by CD and sex, while calibrating the overall
# cd result to the eleciton
drw_GA_sex <- poststrat_draws(fit_GA, poststrat_tgt = acs_GA, calibrate = TRUE,
                              calib_area_to = select(elec_GA, cd, clinton_vote_2pty),
                              area_var = c("cd", "female"),
                              calib_join_var = "cd",
                              calib_to_var = "clinton_vote_2pty")


## take some examples
samp_ests <- drw_GA_sex %>% filter(cd == "GA-01", iter == 1:5) %>% arrange(iter)

## Gender balance in poststratification target is 48.7 - 51.3
sex_wt <- acs_GA %>%
  filter(cd == "GA-01") %>%
  count(cd, clinton_vote_2pty, female, wt = count) %>%
  mutate(frac = n/sum(n))

## In all iterations, the MRP estimates should add up to the calibration target
samp_ests %>%
  left_join(sex_wt, by = c("cd", "female")) %>%
  group_by(cd, iter, clinton_vote_2pty) %>%
  summarize(implied_vote = sum(p_mrp*frac) / sum(frac))

}