Dear Applied Statistics Workshop Community,
Our next meeting will be on Wednesday, February 21 (12:00 EST). Ross
Mattheis presents "Spurious Mobility in Imperfectly Linked Data Trials"
(joint with Jiafeng Chen).
<When>
February 21, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Estimating intergenerational mobility often requires linking data across
multiple sources. However, mistakes in record linkage can introduce biases
in subsequent estimates. This paper re-examines the history of
intergenerational mobility in the United States with emphasis on bias from
imperfectly linked data. In particular, data corrupted by incorrect links
will typically attenuate estimates of linear estimands towards zero. When
the estimand is the intergenerational elasticity of status, this bias will
tend to exaggerate levels of mobility. We propose two complementary methods
to address bias from imperfectly linked data. Building on a large
literature on Bayesian entity resolution, our first approach samples from a
convenience prior and reports the ratio of the posterior and implicit prior
distributions for the target parameter. Our second approach takes advantage
of the availability of repeated measurements and identification results in
settings with misclassified data due to Hu (2008). Consistent with bias
from data-corruption, our estimates suggest that levels of mobility in the
U.S. were lower than previously believed, with conventional estimates of
the father-son elasticity of occupation status 10% to 40% lower than our
estimates. The gap between ours and conventional estimates is largest in
the mid-nineteenth century and declines in more recent years, resulting in
relatively stable levels of mobility over the period.
<2023-2024 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our next meeting will be on Wednesday, February 14 (12:00 EST). Teppei
Yamamoto presents "Using Covariates to Improve Inference in the
Preference-Incorporating Choice and Assignment (PICA) Design for Randomized
Controlled Trials" (joint with Adam Kaplan).
<When>
February 14, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
A key challenge in randomized controlled trials (RCTs) is to ensure
external validity so that findings from a study can inform real-world
policy decisions, where individual decision-makers may self-select into
different treatments based on their own preferences about the treatment
options. If the effects of treatments depend on subjects' treatment
preferences, the average treatment effects (ATEs) estimated in a standard
RCT will be biased for the conditional ATEs among those who actually prefer
to take the treatment. Knox et al. (2019) proposed a new experimental
design, later coined the preference-incorporating choice and assignment
(PICA) design (de Benedictis-Kessner et al., 2019), which employs double
randomization to estimate the ATE conditional on treatment choice. In this
paper, we extend the PICA design to incorporate subjects' pre-treatment
characteristics which might confound effect heterogeneity even after
conditioning on their stated preferences. This extension not only relaxes
the key identification assumption in the original design to address
possible bias but also potentially improves precision in the estimates.
After establishing nonparametric identification results, we propose both
frequentist and Bayesian approaches for inference and study their
finite-sample performance via Monte Carlo simulations. We illustrate the
proposed method with empirical application to media exposure experiments.
<2023-2024 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our next meeting will be on Wednesday, February 7 (12:00 EST). Elisabeth
Paulson presents "Improving Refugee Resettlement Outcomes with Optimization.
"
<When>
February 7, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Every year, tens of thousands of refugees and asylum seekers are resettled
in host countries across the world. In many host countries, newcomers are
assigned to a specific locality (e.g., city) upon arrival by a resettlement
agency. This assignment decision has a profound long-term impact on
integration outcomes. The high-level goal of this line of work is to
improve these outcomes through prediction and optimization algorithms.
We will describe two new dynamic assignment algorithms to dynamically match
refugees and asylum seekers to geographic localities within a host country.
The first---currently implemented in a multi-year pilot in
Switzerland---achieves near-optimal expected employment (and improves upon
the status quo procedure by about 40%). However, it can result in an
imbalanced allocation to the localities over time, which creates
undesirable workload inefficiencies for resettlement agencies. To address
this problem, the second algorithm—currently being deployed in the
US—balances the goal of improving outcomes with the desire for a balanced
allocation over time. We will also discuss extensions of these methods that
improve predictive performance in the face of non-stationarity, and enhance
robustness and fairness across demographic groups.
<2023-2024 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our next meeting of the spring semester will be on January 31 (12:00 EST).
Sooahn Shin presents "Measuring Issue Specific Ideal Points from Roll Call
Votes."
<When>
January 31, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Ideal points are widely used to measure the ideology and policy preferences
of political actors, from voters and politicians to sovereign states. Yet,
the lingering challenge is to measure ideal points specific to a single
issue area. Scholars who wish to measure preferences in a specific area of
interest often resort to subsetting the voting data, resulting in the loss
of valuable information and rendering ambiguous comparisons across
different issue areas. To address this, I introduce IssueIRT — a
hierarchical Item Response Theory (IRT) model that estimates an
issue-specific axis representing a continuum extending from left to right
positions on the issue using roll-call votes and their issue labels. This
approach first estimates multidimensional ideal points using all available
voting data, which are then projected onto issue-specific axes to generate
single-dimensional, issue-specific ideal points. Furthermore, I develop a
measure of issue similarity to compare the alignment of different issue
areas on a unified left-to-right spectrum. I demonstrate that IssueIRT
effectively captures issue-specific voting behaviors through simulations
and a validation study that measures sectionalism in the US House of
Representatives during the 1890s gold standard era. Finally, I show that
polarization in Congress has markedly increased across 32 separate issues
from 1979 to 2023. IssueIRT is implemented in issueirt, an open-source R
package.
<2023-2024 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Welcome back! Our first meeting of the spring semester will be on January
24 (12:00 EST). Hans Demetrio Gaebler presents "Overcoming Statistical
Challenges in Detecting Discrimination."
<When>
January 24, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Outcome tests are a long-standing and widely used approach to detecting
discrimination in lending, hiring, policing, and beyond. For example, if
White loan recipients are found to default more often than racial minority
recipients, the outcome test would suggest that lenders impose a double
standard, preferentially lending to riskier White loan applicants. Despite
its popularity, outcome tests have long been known to be statistically
flawed, sometimes even suggesting discrimination against the group that in
reality received preferential treatment. We propose two methods for
remedying these statistical shortcomings. First, we show that a twist on
standard outcome tests leads to surprisingly strong statistical guarantees.
Our test is provably correct under a simple non-parametric assumption that
we show — both empirically and theoretically — likely holds in many common
scenarios. One limitation of this test is that it is, in some cases,
inconclusive. In light of this, we introduce an alternative test of
discrimination — which we call risk-adjusted regression — that can handle a
broader range of cases, but which requires a richer set of covariates. This
latter approach sheds light on the connection between statistical and legal
understandings of discrimination.
<2023-2024 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our last meeting of this semester will be on November 29 (12:00 EST). Yi
Zhang presents "Individualized Policy Evaluation and Learning under
Clustered Network Interference."
<When>
November 29, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
While there now exists a large literature on policy evaluation and
learning, much of prior work assumes that the treatment assignment of one
unit does not affect the outcome of another unit. Unfortunately, ignoring
interference may lead to biased policy evaluation and yield ineffective
learned policies. For example, treating influential individuals who have
many friends can generate positive spillover effects, thereby improving the
overall performance of an individualized treatment rule (ITR). We consider
the problem of evaluating and learning an optimal ITR under clustered
network (or partial) interference where clusters of units are sampled from
a population and units may influence one another within each cluster. Under
this model, we propose an estimator that can be used to evaluate the
empirical performance of an ITR. We show that this estimator is
substantially more efficient than the standard inverse probability
weighting estimator, which does not impose any assumption about spillover
effects. We derive the finite-sample regret bound for a learned ITR,
showing that the use of our efficient evaluation estimator leads to the
improved performance of learned policies. Finally, we conduct simulation
and empirical studies to illustrate the advantages of the proposed
methodology.
The most recent draft can be found here <https://arxiv.org/abs/2311.02467>.
<2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
We will not be meeting this Wednesday due to the holiday. Hope you have a
restful break and see you on Nov 29th for our last session this semester!
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our next meeting will be on November 15 (12:00 EST). Ashesh Rambachan
presents "From Predictive Algorithms to Automatic Generation of Anomalies"
(joint with Sendhil Mullainathan).
<When>
November 15, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Economic theories often progress through the discovery of anomalies.''
Canonical examples of anomalies include the Allais Paradox and the
Kahneman-Tversky choice experiments, which are constructed menus of
lotteries that highlighted particular flaws in expected utility theory and
spurred the development of new theories for decision-making under risk. In
this paper, we develop algorithmic procedures to automatically generate
such anomalies. Our algorithmic procedures take as inputs an existing
theory and data it seeks to explain, and then generate examples on which we
would likely observe violations of our existing theory if we were to
collect data. As an illustration, we produce anomalies for expected utility
theory using simulated lottery choice data from individuals who behave
according to cumulative prospect theory. Our procedures recover known
anomalies for expected utility theory in behavioral economics and discover
novel anomalies based on the probability weighting function. We conduct
incentivized experiments to collect choice data on our algorithmically
generated anomalies, finding that participants violate expected utility
theory at similar rates to the Allais Paradox and Common Ratio Effect.
While this illustration is specific, our anomaly generation procedures are
general and can be applied in any domain where there exists a formal theory
and rich data that the theory seeks to explain.
The most recent draft can be found here
<https://economics.mit.edu/sites/default/files/inline-files/mr_anomalies.pdf>
.
<2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our next meeting will be on November 8 (12:00 EST). Zeyang Jia presents
"Bayesian Safe Policy Learning with Chance Constrained Optimization:
Application to Military Security Assessment during the Vietnam War."
<When>
November 8, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Algorithmic and data-driven decisions and recommendations are commonly used
in high-stakes decision-making settings such as criminal justice, medicine,
and public policy. We investigate whether it would have been possible to
improve a security assessment algorithm employed during the Vietnam War,
using outcomes measured immediately after its introduction in late 1969.
This empirical application raises several methodological challenges that
frequently arise in high-stakes algorithmic decision-making. First, before
implementing a new algorithm, it is essential to characterize and control
the risk of yielding worse outcomes than the existing algorithm. Second,
the existing algorithm is deterministic, and learning a new algorithm
requires transparent extrapolation. Third, the existing algorithm involves
discrete decision tables that are common but difficult to optimize over. To
address these challenges, we introduce the Average Conditional Risk
(ACRisk), which first quantifies the risk that a new algorithmic policy
leads to worse outcomes for subgroups of individual units and then averages
this over the distribution of subgroups. We also propose a Bayesian policy
learning framework that maximizes the posterior expected value while
controlling the posterior expected ACRisk. This framework separates the
estimation of heterogeneous treatment effects from policy optimization,
enabling flexible estimation of effects and optimization over complex
policy classes. We characterize the resulting chance-constrained
optimization problem as a constrained linear programming problem. Our
analysis shows that compared to the actual algorithm used during the
Vietnam War, the learned algorithm assesses most regions as more secure and
emphasizes economic and political factors over military factors.
<2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our next meeting will be on November 1 (12:00 EST). Naijia Liu presents
"Synthetic Control Method with Pre-treatment Outcomes Missing" (Joint work
with Sooahn Shin and Soichiro Yamauchi).
<When>
November 1, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
The synthetic control method (SCM) is commonly used in social science
research to estimate treatment effects. It involves creating a synthesized
control unit for the treated unit in observational studies. The quality of
this synthesized control unit is influenced by factors like the number of
pretreatment periods and missing values. Many empirical datasets,
particularly those with a panel structure, often encounter issues with
missing values. This project studies the impact of missing values on SCM
and provides theoretical guidance to the potential bias. We formulate SCM
with missing data in a vertical regression perspective. Under such setting,
missing values can be deemed as omitted variables. We show that the bias of
the ATT is decomposed into (1) weight of the missing unit for constructing
the synthetic control and (2) the imbalance between the missing units and
the weighted observed donor units. Building on these result, We propose a
sensitivity analysis for SCM with pretreatment outcomes missing not at
random. To illustrate the method in practice, we revisit a previous study
that examines the impact of Taiwan's expulsion from the International
Monetary Fund (IMF) in 1980 on its precautionary international reserves
using the SCM.
<2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu