Imputes cells with a balancing constraint, using Yamauchi's algorithm.

synth_bmlogit(
  formula,
  microdata,
  poptable,
  fix_to,
  fix_by_area = any(area_var %in% colnames(fix_to)),
  area_var,
  count_var = "count",
  tol = 0.05
)

Arguments

formula

A representation of the aggregate imputation or "outcome" model, of the form X_{K} ~ X_1 + ... X_{K - 1}

microdata

The survey table that the multinomial model will be built off. Must contain all variables in the LHS and RHS of formula.

poptable

The population table, collapsed in terms of counts. Must contain all variables in the RHS of formula, as well as the variables specified in area_var and count_var below.

fix_to

A dataset with only marginal counts or proportions of the outcome in question, by each area. Proportions will be corrected so that the margins of the synthetic joint will match these, with a simple ratio.

fix_by_area

logical, whether to fix to targets area by area. Defaults to TRUE if area_var is a variable in fix_to. If FALSE, collapses the input to a single target.

area_var

A character vector of the area of interest.

count_var

A character variable that specifies which variable in poptable indicates the count

tol

Tolerance for balance

Source

Soichiro Yamauchi and Shiro Kuriwaki (2021). bmlogit: Multinomial logit with balancing constraints. R package version 0.0.3.

See also

Examples

library(dplyr) # can take a few minutes if fix_by_area = TRUE (the default) educ_target <- count(acs_educ_NY, cd, educ, wt = count, name = "count") educ_target
#> # A tibble: 108 × 3 #> cd educ count #> <chr> <fct> <dbl> #> 1 NY-01 HS or Less 202298 #> 2 NY-01 Some College 169556 #> 3 NY-01 4-Year 111561 #> 4 NY-01 Post-Grad 83255 #> 5 NY-02 HS or Less 231614 #> 6 NY-02 Some College 157090 #> 7 NY-02 4-Year 99763 #> 8 NY-02 Post-Grad 71094 #> 9 NY-03 HS or Less 138929 #> 10 NY-03 Some College 127003 #> # … with 98 more rows
acs_race_NY
#> # A tibble: 4,320 × 6 #> year cd female race age count #> <dbl> <chr> <int> <fct> <fct> <dbl> #> 1 2018 NY-01 0 Black 18 to 24 years 826 #> 2 2018 NY-01 0 Black 18 to 24 years 1828 #> 3 2018 NY-01 0 Black 25 to 34 years 1127 #> 4 2018 NY-01 0 Black 25 to 34 years 1298 #> 5 2018 NY-01 0 Black 35 to 44 years 2696 #> 6 2018 NY-01 0 Black 45 to 64 years 3779 #> 7 2018 NY-01 0 Black 45 to 64 years 2153 #> 8 2018 NY-01 0 Black 65 years and over 781 #> 9 2018 NY-01 0 Black 65 years and over 624 #> 10 2018 NY-01 0 Black 65 years and over 133 #> # … with 4,310 more rows
pop_syn <- synth_bmlogit(educ ~ race + age + female, microdata = cc18_NY, fix_to = educ_target, poptable = acs_race_NY, area_var = "cd") pop_syn
#> # A tibble: 6,480 × 9 #> cd race age female prX educ prZ_givenX prXZ count #> <chr> <fct> <fct> <int> <dbl> <fct> <dbl> <dbl> <dbl> #> 1 NY-01 White 18 to 24 years 0 0.0358 HS or Less 0.294 1.05e-2 6060. #> 2 NY-01 White 18 to 24 years 0 0.0358 Some Coll… 0.504 1.80e-2 10378. #> 3 NY-01 White 18 to 24 years 0 0.0358 4-Year 0.173 6.18e-3 3559. #> 4 NY-01 White 18 to 24 years 0 0.0358 Post-Grad 0.0295 1.05e-3 608. #> 5 NY-01 White 18 to 24 years 1 0.0355 HS or Less 0.373 1.33e-2 7646. #> 6 NY-01 White 18 to 24 years 1 0.0355 Some Coll… 0.459 1.63e-2 9397. #> 7 NY-01 White 18 to 24 years 1 0.0355 4-Year 0.147 5.22e-3 3011. #> 8 NY-01 White 18 to 24 years 1 0.0355 Post-Grad 0.0209 7.42e-4 428. #> 9 NY-01 White 25 to 34 years 0 0.0476 HS or Less 0.180 8.58e-3 4946. #> 10 NY-01 White 25 to 34 years 0 0.0476 Some Coll… 0.195 9.30e-3 5360. #> # … with 6,470 more rows