Assistant Professor of Political Science, Yale University
Resident Fellow at the Institution of Social and Policy Studies
I received the 2020 Dean’s Excellence in Teaching Award at the Harvard Kennedy School of Public Policy for my teaching in econometrics and shepherding the use of the R statistical language in its core statistics sequence. This work included creating portable screencasts of R workflows, covering common topics in econometrics, causal inference, data science, quantitative social science.
I am a RStudio certified trainer, and have created several resources for statistics and data science for the social sciences that I hope are useful for other students and instructors. These include a workshop I co-designed on training teachers in the social sciences for teaching statistics and programming, my presentations on project-oriented workflow, introduction to version control with GitHub, introduction to Stata, and statistics notes covering Probability, Inference, and Regression written for a Masters-level statistics course (links).
Any use of my teaching material available online is welcome with attribution.
The following screencasts were designed as short guided introductions for particular statistical concepts. They are probably best used as links in problem sets that students can refer to at their own pace, before they set out to tackle harder, open-ended questions.
All code uses R conforms to the tidyverse style and often uses tidyverse syntax, and uses real dataset that can be loaded on any R environment quickly (e.g. through a package built-in dataset). It is geared towards advanced undergraduates or a masters class where students already have some familiarity with probability and inference.
Contents: Package Setup, lm, LASSO, Fixed Effects, Instrumental Variables, Regression Discontinuity, Diff-in-Diff, Creating Functions, Maps in ggplot
Installing vs. loading scripts, basic structure and sections of a script, function masking.
Running linear regression, formulas, options to the lm
function:
lm
ggplot2
(putting 1-3 together)Using cv.glmnet
to
"."
, model matrix creation.predict
with the testing dataset, generic functions.Fixed effects syntax with lfe::felm
, adjusting for clustered errors.
Instrumental variables as an omitted variable problem, using both AER::ivreg
and lfe::felm
packages. Uses the proximity to college dataset by Card (1994).
Visualizing regression discontinuity, estimating coefficients with interactions, polynomials, and local linear regression.
Time series data, long form, plotting time trends, interactions, 2 by 2 difference-in-differences, DID with fixed effects.
Thanks to Oscar Torres-Reyna for the data (http://princeton.edu/~otorres/DID101R.pdf).
Arguments, body, and return statement. Also see the function basics tutorial for background.
Choropleth maps using sf
objects in ggplot2
, merging in other variables into sf dataframes.