Dear Applied Statistics Workshop Community,
Our next meeting will be on November 1 (12:00 EST). Naijia Liu presents
"Synthetic Control Method with Pre-treatment Outcomes Missing" (Joint work
with Sooahn Shin and Soichiro Yamauchi).
<When>
November 1, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
The synthetic control method (SCM) is commonly used in social science
research to estimate treatment effects. It involves creating a synthesized
control unit for the treated unit in observational studies. The quality of
this synthesized control unit is influenced by factors like the number of
pretreatment periods and missing values. Many empirical datasets,
particularly those with a panel structure, often encounter issues with
missing values. This project studies the impact of missing values on SCM
and provides theoretical guidance to the potential bias. We formulate SCM
with missing data in a vertical regression perspective. Under such setting,
missing values can be deemed as omitted variables. We show that the bias of
the ATT is decomposed into (1) weight of the missing unit for constructing
the synthetic control and (2) the imbalance between the missing units and
the weighted observed donor units. Building on these result, We propose a
sensitivity analysis for SCM with pretreatment outcomes missing not at
random. To illustrate the method in practice, we revisit a previous study
that examines the impact of Taiwan's expulsion from the International
Monetary Fund (IMF) in 1980 on its precautionary international reserves
using the SCM.
<2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our next meeting will be on October 25 (12:00 EST). Melody Huang presents
"Towards Credible Causal Inference under Real-World Complications:
Sensitivity Analysis for Generalizability"
<When>
October 25, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Randomized controlled trials (RCT’s) allow researchers to estimate causal
effects in an experimental sample with minimal identifying assumptions.
However, to generalize or transport a causal effect from an RCT to a target
population, researchers must adjust for a set of treatment effect
moderators. In practice, it is impossible to know whether the set of
moderators has been properly accounted for. In the following talk, I
propose a two parameter sensitivity analysis for generalizing or
transporting experimental results using weighted estimators. The
contributions in the paper are two-fold. First, I show that the sensitivity
parameters are scale-invariant and standardized. Unlike existing
sensitivity analyses in external validity, the proposed framework allows
researchers to simultaneously account for the bias in their estimates from
omitting a moderator, as well as potential changes to their inference.
Second, I propose several tools researchers can use to perform sensitivity
analysis: (1) graphical and numerical summaries for researchers to assess
how robust an estimated effect is to changes in magnitude as well as
statistical significance; (2) a formal benchmarking approach for
researchers to estimate potential sensitivity parameter values using
existing data; and (3) an extreme scenario analysis. While sensitivity
tools for routine reporting have been introduced for sensitivity frameworks
for outcome modeling approaches, these tools do not yet exist for weighted
estimators. Thus, the talk introduces a collection of methods that provide
much needed interpretability to sensitivity analyses, and a framework for
researchers to transparently and quantitatively argue about the robustness
in their estimated effects.
<2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our next meeting will be on October 18 (12:00 EST). Dae Woong Ham presents
"Design-Based Confidence Sequences: A General Approach to Risk Mitigation
in Online Experimentation."
<When>
October 18, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Randomized experiments have become the standard method for companies to
evaluate the performance of new products or services. In addition to
augmenting managers' decision-making, experimentation mitigates risk by
limiting the proportion of customers exposed to innovation. Since many
experiments are on customers arriving sequentially, a potential solution is
to allow managers to ``peek'' at the results when new data becomes
available and stop the test if the results are statistically significant.
Unfortunately, peeking invalidates the statistical guarantees for standard
statistical analysis and leads to uncontrolled type-1 error. Our paper
provides valid design-based confidence sequences, sequences of confidence
intervals with uniform type-1 error guarantees over time for various
sequential experiments in an assumption-light manner. In particular, we
focus on finite-sample estimands defined on the study participants as a
direct measure of the incurred risks by companies. Our proposed confidence
sequences are valid for a large class of experiments, including multi-arm
bandits, time series, and panel experiments. We further provide a variance
reduction technique incorporating modeling assumptions and covariates.
Finally, we demonstrate the effectiveness of our proposed approach through
a simulation study and three real-world applications from Netflix. Our
results show that by using our confidence sequence, harmful experiments
could be stopped after only observing a handful of units; for instance, an
experiment that Netflix ran on its sign-up page on 30,000 potential
customers would have been stopped by our method on the first day before 100
observations.
<2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our next meeting will be on October 11 (12:00 EST). Soichiro Yamauchi
presents "Statistical Analysis with Machine Learning Predicted Variables."
<When>
October 11, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Scholars in the social sciences are increasingly relying on machine
learning (ML) techniques to construct data from large corpora of text and
images. The ML-generated variables are subsequently utilized in statistical
analysis to address substantive questions through regression and hypothesis
testing. However, this approach can introduce substantial bias and lead to
incorrect inferences due to prediction errors during the machine learning
stage. In this paper, we present an approach that incorporates ML-generated
variables into regression analysis while ensuring consistency and
asymptotic normality. The proposed approach leverages a small-scale
human-coded sample to capture the bias in the naive estimator, without the
need for strict assumptions about the structure of prediction errors.
Furthermore, we have developed diagnostic tools to assess whether
additional human coding can further reduce variance in the main analysis.
We illustrate the effectiveness of our method by revisiting a study on the
sources of election fraud with ballot image data and regression analysis.
<2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu
Dear Applied Statistics Workshop Community,
Our next meeting will be on October 4 (12:00 EST). Michael Lingzhi Li
presents "Statistical Performance Guarantee for Selecting Those Predicted
to Benefit Most from Treatment."
<When>
October 4, 12:00 to 1:30 PM, EST
Lunch will be available for pick-up inside CGIS K354.
<Where>
In-person: CGIS K354
Zoom:
https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09
<Abstract>
Across a wide array of disciplines, many researchers use modern machine
learning algorithms to identify a subgroup of individuals, called
exceptional responders, who are likely to be helped by a treatment the
most. A common approach is to first estimate the conditional average
treatment effect (CATE) or its proxy given a set of pre-treatment
covariates and then optimize a cutoff of the resulting treatment
prioritization score to prioritize who should receive the treatment.
Unfortunately, since these estimated scores are often biased and noisy in
practice, naive reliance on them can lead to misleading inference.
Furthermore, practitioners often utilize the same set of data to optimize
the cutoff and evaluate the performance of the resulting subset, causing a
multiple testing problem. In this paper, we propose a methodology that has
a uniform statistical performance guarantee for selecting such exceptional
responders regardless of the cutoff optimization. Specifically, we develop
a uniform confidence interval for experimentally evaluating the group
average treatment effect (GATE) among the individuals whose estimated score
is at least as high as any given quantile value. This uniform confidence
interval enables researchers to utilize arbitrary methods to choose the
quantile of estimated score, including optimizing over the lower confidence
bound of the estimated GATE among the selected individuals. The proposed
methodology provides this statistical performance guarantee without
suffering from multiple testing problems, and also generalizes to a generic
class of statistics beyond GATE. Importantly, the validity of our
methodology depends solely on randomization of treatment and random
sampling of units and does not require modeling assumptions or resampling
methods. Consequently, our methodology is applicable to any machine
learning algorithm and is computationally efficient.
<2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm…
Best,
Jialu
--
Jialu Li
Department of Government
Harvard University
jialu_li(a)g.harvard.edu