Estimates joint distribution by simply assuming independence and multiplying proportions.
synth_prod(formula, poptable, newtable, area_var, count_var = "count")
A representation of the aggregate imputation or "outcome" model,
of the form X_{K} ~ X_1 + ... X_{K - 1}
The population table, collapsed in terms of counts. Must contain
all variables in the RHS of formula
, as well as the variables specified in
area_var
and count_var
below.
A dataset that contains marginal counts or proportions. Will be collapsed internally to get simple proportions.
A character vector of the area of interest.
A character variable that specifies which variable in poptable
indicates the count
That is, we already know p(X_{1}, ..., X_{K - 1}, A)
from poptable
and a marginal p(X_{K}, A)
from the additional distribution to weight to. Then
p(X_{1}, .., X_{K - 1}, X_{K}, A) = p(X_{1}, ..., X_{K - 1}, A) x p(X_{K}, A)
.
synth_mlogit()
for a more nuanced model that uses survey data as
the basis of the joint estimation.
library(dplyr)
library(ccesMRPprep)
# suppose we want know the distribution of (age x female) and we know the
# distribution of (race), by CD, but we don't know the joint of the two.
race_target <- count(acs_race_NY, cd, race, wt = count, name = "count")
pop_prod <- synth_prod(race ~ age + female,
poptable = acs_race_NY,
newtable = race_target,
area_var = "cd")
# In this example, we know the true joint. Does it match?
pop_val <- left_join(pop_prod,
count(acs_race_NY, cd, age, female, race, wt = count, name = "count"),
by = c("cd", "age", "female", "race"),
suffix = c("_est", "_truth"))
# AOC's district in the bronx
pop_val %>%
filter(cd == "NY-14", age == "35 to 44 years", female == 0) %>%
select(cd, race, count_est, count_truth)
#> # A tibble: 6 × 4
#> cd race count_est count_truth
#> <chr> <fct> <dbl> <dbl>
#> 1 NY-14 White 15269. 11312
#> 2 NY-14 Black 7444. 6578
#> 3 NY-14 Hispanic 30824. 34668
#> 4 NY-14 Asian 11730. 11373
#> 5 NY-14 Native American 0 0
#> 6 NY-14 All Other 16904. 18239