Get MCMC draws of post-stratified estimate of demog x cd cells

poststrat_draws(
  model,
  poststrat_tgt,
  orig_data = NULL,
  question_lbl = attr(orig_data, "question"),
  area_var = "cd",
  count_var = "count",
  calibrate = FALSE,
  calib_area_to = NULL,
  calib_to_var = NULL,
  calib_join_var = NULL,
  dtplyr = FALSE,
  new_levels = FALSE
)

Arguments

model

stan model from fit_brms

poststrat_tgt

The poststratification target. It must contain the column count, which is treated as the number of trials in the binomial model.

orig_data

original survey data. This defaults to NULL but if supplied be used to (1) subset the poststratification, to areas only in the survey, and (2) label the question outcome.

question_lbl

A character string that indicates the outcome, e.g. a shorthand for the outcome variable. This is useful when you want to preserve the outcome or description of multiple models.

area_var

A character string for the variable name(s) for area to group and aggregate by. That is, the area of interest in MRP. Defaults to "cd"

count_var

A character string for the variable name for the population count in the poststrat_tgt dataframe. This will be renamed as if it is a trial count in the model. Defaults to "count".

calibrate

Adjust each cell's posthoc estimates so they add up to a pre-specified, user input? Logical, defaulting to FALSE. See the calib_area_to argument.

calib_area_to

A dataset with area-level correct values to calibrate to in the last column. It should contain the variables set in calib_join_var and calib_to_var. See posthoc_error() for details.

calib_to_var

The variable to calibrate to, e.g. the voteshare

calib_join_var

The variable that defines the level of the calibration dataframe that can be joined, e.g. the area

dtplyr

Whether to use a data.table/dtplyr backend for processing for slightly faster dataframe wrangling. Currently does not apply to anything within the function.

new_levels

If there are new levels in the poststrat table that do not have coefficients in the survey data, should there be an extrapolation or assignment to 0s? The answer should almost always be No in MRP.

Value

A tidy dataset with qID x cd x iter number of rows, where qID is the number of questions (outcomes), cd is the number of geographies, and iter is the number of iterations estimated in the MCMC model. The demographic cells within a district are averaged across, and a MRP estimate is computed. It contains the columns

iter

The number of iterations

cd

The geography

qID

The question

p_mrp_est

The proportion of success, estimated by MRP.

Examples

class(fit_GA) # brms object
#> [1] "brmsfit"
head(acs_GA) # dataset
#> # A tibble: 6 × 8
#>    year cd    female educ         age        count clinton_vote clinton_vote_2p…
#>   <dbl> <chr>  <int> <fct>        <fct>      <dbl>        <dbl>            <dbl>
#> 1  2016 GA-01      0 HS or Less   18 to 24 …  1082        0.409            0.420
#> 2  2016 GA-01      0 HS or Less   18 to 24 …  6603        0.409            0.420
#> 3  2016 GA-01      0 HS or Less   18 to 24 … 16412        0.409            0.420
#> 4  2016 GA-01      0 Some College 18 to 24 … 14732        0.409            0.420
#> 5  2016 GA-01      0 Some College 18 to 24 …  1214        0.409            0.420
#> 6  2016 GA-01      0 4-Year       18 to 24 …  1810        0.409            0.420

drw_GA <- poststrat_draws(fit_GA, poststrat_tgt = acs_GA, area_var = "cd")
drw_GA
#> # A tibble: 56,000 × 3
#>    cd     iter p_mrp
#>    <chr> <dbl> <dbl>
#>  1 GA-01     1 0.496
#>  2 GA-01     2 0.484
#>  3 GA-01     3 0.469
#>  4 GA-01     4 0.447
#>  5 GA-01     5 0.414
#>  6 GA-01     6 0.490
#>  7 GA-01     7 0.429
#>  8 GA-01     8 0.491
#>  9 GA-01     9 0.427
#> 10 GA-01    10 0.435
#> # … with 55,990 more rows

if (FALSE)  {

# 1. get MRP estimates by CD, while calibrating the overall cd results to
# the election
## Each takes about 75 secs
drw_GA_fix <- poststrat_draws(fit_GA, poststrat_tgt = acs_GA, calibrate = TRUE,
                              calib_area_to = elec_GA,
                              calib_join_var = "cd",
                              calib_to_var = "clinton_vote_2pty")

# to get MRP estimates by CD and sex, while calibrating the overall
# cd result to the eleciton
drw_GA_sex <- poststrat_draws(fit_GA, poststrat_tgt = acs_GA, calibrate = TRUE,
                              calib_area_to = select(elec_GA, cd, clinton_vote_2pty),
                              area_var = c("cd", "female"),
                              calib_join_var = "cd",
                              calib_to_var = "clinton_vote_2pty")


## take some examples
samp_ests <- drw_GA_sex %>% filter(cd == "GA-01", iter == 1:5) %>% arrange(iter)

## Gender balance in poststratification target is 48.7 - 51.3
sex_wt <- acs_GA %>%
  filter(cd == "GA-01") %>%
  count(cd, clinton_vote_2pty, female, wt = count) %>%
  mutate(frac = n/sum(n))

## In all iterations, the MRP estimates should add up to the calibration target
samp_ests %>%
  left_join(sex_wt, by = c("cd", "female")) %>%
  group_by(cd, iter, clinton_vote_2pty) %>%
  summarize(implied_vote = sum(p_mrp*frac) / sum(frac))

}