Imputes cells with a balancing constraint, using Yamauchi's algorithm.

synth_bmlogit(
  formula,
  microdata,
  poptable,
  fix_to,
  fix_by_area = any(area_var %in% colnames(fix_to)),
  area_var,
  count_var = "count",
  tol = 0.05
)

Source

Soichiro Yamauchi and Shiro Kuriwaki (2021). bmlogit: Multinomial logit with balancing constraints. R package version 0.0.3.

Arguments

formula

A representation of the aggregate imputation or "outcome" model, of the form X_{K} ~ X_1 + ... X_{K - 1}

microdata

The survey table that the multinomial model will be built off. Must contain all variables in the LHS and RHS of formula.

poptable

The population table, collapsed in terms of counts. Must contain all variables in the RHS of formula, as well as the variables specified in area_var and count_var below.

fix_to

A dataset with only marginal counts or proportions of the outcome in question, by each area. Proportions will be corrected so that the margins of the synthetic joint will match these, with a simple ratio.

fix_by_area

logical, whether to fix to targets area by area. Defaults to TRUE if area_var is a variable in fix_to. If FALSE, collapses the input to a single target.

area_var

A character vector of the area of interest.

count_var

A character variable that specifies which variable in poptable indicates the count

tol

Tolerance for balance

See also

Examples


library(dplyr)
library(ccesMRPprep)

# can take a few minutes if fix_by_area = TRUE (the default)
educ_target <- count(acs_educ_NY, cd, educ, wt = count, name = "count")

educ_target
#> # A tibble: 108 × 3
#>    cd    educ          count
#>    <chr> <fct>         <dbl>
#>  1 NY-01 HS or Less   202298
#>  2 NY-01 Some College 169556
#>  3 NY-01 4-Year       111561
#>  4 NY-01 Post-Grad     83255
#>  5 NY-02 HS or Less   231614
#>  6 NY-02 Some College 157090
#>  7 NY-02 4-Year        99763
#>  8 NY-02 Post-Grad     71094
#>  9 NY-03 HS or Less   138929
#> 10 NY-03 Some College 127003
#> # … with 98 more rows
acs_race_NY
#> # A tibble: 4,320 × 6
#>     year cd    female race  age               count
#>    <dbl> <chr>  <int> <fct> <fct>             <dbl>
#>  1  2018 NY-01      0 Black 18 to 24 years      826
#>  2  2018 NY-01      0 Black 18 to 24 years     1828
#>  3  2018 NY-01      0 Black 25 to 34 years     1127
#>  4  2018 NY-01      0 Black 25 to 34 years     1298
#>  5  2018 NY-01      0 Black 35 to 44 years     2696
#>  6  2018 NY-01      0 Black 45 to 64 years     3779
#>  7  2018 NY-01      0 Black 45 to 64 years     2153
#>  8  2018 NY-01      0 Black 65 years and over   781
#>  9  2018 NY-01      0 Black 65 years and over   624
#> 10  2018 NY-01      0 Black 65 years and over   133
#> # … with 4,310 more rows

pop_syn <- synth_bmlogit(educ ~ race + age + female,
                         microdata = cc18_NY,
                         fix_to = educ_target,
                         poptable = acs_race_NY,
                         area_var = "cd")
pop_syn
#> # A tibble: 6,480 × 9
#>    cd    race  age            female    prX educ       prZ_givenX    prXZ  count
#>    <chr> <fct> <fct>           <int>  <dbl> <fct>           <dbl>   <dbl>  <dbl>
#>  1 NY-01 White 18 to 24 years      0 0.0358 HS or Less     0.293  1.05e-2  6047.
#>  2 NY-01 White 18 to 24 years      0 0.0358 Some Coll…     0.490  1.75e-2 10100.
#>  3 NY-01 White 18 to 24 years      0 0.0358 4-Year         0.188  6.72e-3  3871.
#>  4 NY-01 White 18 to 24 years      0 0.0358 Post-Grad      0.0284 1.02e-3   586.
#>  5 NY-01 White 18 to 24 years      1 0.0355 HS or Less     0.385  1.37e-2  7877.
#>  6 NY-01 White 18 to 24 years      1 0.0355 Some Coll…     0.449  1.60e-2  9197.
#>  7 NY-01 White 18 to 24 years      1 0.0355 4-Year         0.146  5.18e-3  2987.
#>  8 NY-01 White 18 to 24 years      1 0.0355 Post-Grad      0.0205 7.28e-4   420.
#>  9 NY-01 White 25 to 34 years      0 0.0476 HS or Less     0.184  8.77e-3  5052.
#> 10 NY-01 White 25 to 34 years      0 0.0476 Some Coll…     0.191  9.07e-3  5228.
#> # … with 6,470 more rows