Dear Applied Statistics Workshop Community,
Our next meeting of the semester will be on April 26 (12:00 EST). Dean Knox
and Guilherme Duarte will present "Optimal Allocation of Data-Collection
Resources."
<Where>
CGIS K354
Bagged lunches are available for pick-up at 11:45 (CGIS K354).
Zoom:
https://harvard.zoom.us/j/99181972207?pwd=Ykd3ZzVZRnZCSDZqNVpCSURCNnVvQT09
<Abstract>
Complications in applied work often prevent researchers from obtaining
unique point estimates of target quantities using cheaply available data—at
best, ranges of possibilities, or sharp bounds, can be reported. To make
progress, researchers frequently collect more information by (1)
re-cleaning existing datasets, (2) gathering secondary datasets, or (3)
pursuing entirely new designs. Common examples include manually correcting
missingness, recontacting attrited units, validating proxies with
ground-truth data, finding new instrumental variables, and conducting
follow-up experiments. These auxiliary tasks are costly, forcing tradeoffs
with (4) larger samples from the original approach. Researchers'
data-collection strategies, or choices over these tasks, are often based on
convenience or intuition. In this work, we show how to provably identify
the most cost-efficient data-collection strategy for a given research
problem.
We quantify the quality of existing data using the width of the confidence
regions on the sharp bounds, which captures two sources of uncertainty:
statistical uncertainty due to finite samples of the variables measured,
and fundamental uncertainty because some variables are not measured at all.
We then show how to compute the expected information gain, defined as the
expected amount by which each data-collection task will narrow these bounds
by addressing one or both sources of uncertainty. Finally, we select the
task with the greatest information efficiency, or gain per unit cost.
Leveraging recent advances in automatic bounding (Duarte et al., 2022), we
prove efficiency is computable for essentially any discrete causal system,
estimand, and auxiliary data task.
Based on this theoretical framework, we develop a method for optimal
adaptive allocation of data-collection resources. Users first input a
causal graph, estimand, and past data. They then enumerate distributions
from which future samples can be drawn, fixed and per-sample costs, and any
prior beliefs. Our method automatically derives and sequentially updates
the optimal data-collection strategy.
<2022-2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/embed?src=c_3v93pav9fjkkldrbu9snbhned8…
Best,
Shusei
Dear Applied Statistics Workshop Community,
Our next meeting of the semester will be on April 19 (12:00 EST). Michela
Carlana will present "Revealing Stereotypes: Evidence from Immigrants in
Schools."
<Where>
CGIS K354
Bagged lunches are available for pick-up at 11:45 (CGIS K354).
Zoom:
https://harvard.zoom.us/j/99181972207?pwd=Ykd3ZzVZRnZCSDZqNVpCSURCNnVvQT09
<Abstract>
We study how people change their behavior after learning they are biased.
Teachers in Italian schools give lower grades to immigrant students
relative to natives with comparable ability. In two experiments, we reveal
to teachers their own bias, measured by an Implicit Association Test (IAT).
Randomizing the timing of disclosure, we find that learning one’s IAT
before deciding end-of-term grades reduces the native-immigrant gap in
grades. IAT disclosure and generic debiasing have similar average effects,
but there is heterogeneity: teachers with more negative stereotypes do not
respond to generic debiasing but change their behavior when informed about
their own IAT.
<2022-2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/embed?src=c_3v93pav9fjkkldrbu9snbhned8…
Best,
Shusei
Dear Applied Statistics Workshop Community,
Our next meeting of the semester will be on April 12 (12:00 EST). Naoki
Egami will present "Empirical Strategies Toward External Validity:
Framework and External Robustness."
<Where>
CGIS K354
Bagged lunches are available for pick-up at 11:45 (CGIS K354).
Zoom:
https://harvard.zoom.us/j/99181972207?pwd=Ykd3ZzVZRnZCSDZqNVpCSURCNnVvQT09
<Abstract>
Over the last few decades, social scientists have developed and applied a
host of statistical methods to make valid causal inferences, known as the
credibility revolution. This trend has primarily focused on internal
validity — researchers aim to unbiasedly estimate causal effects within a
study. However, one of the most important long-standing methodological
debates is about external validity — how scientists can generalize causal
findings beyond a specific study. This question of external validity has a
long history in the social sciences, going back to at least the 1960s, and
it has recently become even more essential, given that huge opportunities
and challenges of accumulating causal knowledge have become evident.
In this talk, I will discuss a set of empirical strategies to improve
external validity in practice. I briefly introduce a formal framework of
external validity (Egami and Hartman, 2022; APSR) that synthesizes diverse
external validity concerns. Then, I will propose a new simple approach to
quantify the robustness of experimental results to external validity bias
(Devaux and Egami, 2022; Egami and Rothenhäusler, 2023+). In particular, I
introduce a measure of external robustness, which ranges from 0 to 1 and
represents how well causal effects estimated in one’s study can be
generalized to other populations and contexts. Researchers can estimate
this quantity using only experimental data (i.e., no additional data
collection), and users can also account for unmeasured confounders. I
discuss a debiased estimator, which is consistent and asymptotically normal
under mild rate conditions that allow for the use of machine learning
estimators. Finally, I provide default benchmarks and discuss practical
guides about how to report external robustness in practice using the R
package “exr” (https://github.com/naoki-egami/exr).
Papers: (1) https://naokiegami.com/paper/external_full.pdf (2)
https://naokiegami.com/paper/external_robust.pdf
<2022-2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/embed?src=c_3v93pav9fjkkldrbu9snbhned8…
Best,
Shusei
Dear Applied Statistics Workshop Community,
Our next meeting of the semester will be on April 5 (12:00 EST). Fredrik
Sävje will present "A Design-Based Riesz Representation Framework for
Randomized Experiments."
<Where>
CGIS K354
Bagged lunches are available for pick-up at 11:45 (CGIS K354).
Zoom:
https://harvard.zoom.us/j/99181972207?pwd=Ykd3ZzVZRnZCSDZqNVpCSURCNnVvQT09
<Abstract>
We describe a new design-based framework for drawing causal inference in
randomized experiments. Causal effects in the framework are defined as
linear functionals evaluated at potential outcome functions. Knowledge and
assumptions about the potential outcome functions are encoded as function
spaces. This makes the framework expressive, allowing experimenters to
formulate and investigate a wide range of causal questions. We describe a
class of estimators for estimands defined using the framework and
investigate their properties. The construction of the estimators is based
on the Riesz representation theorem. We provide necessary and sufficient
conditions for unbiasedness and consistency. Finally, we provide conditions
under which the estimators are asymptotically normal, and describe a
conservative variance estimator to facilitate the construction of
confidence intervals for the estimands.
Paper: https://arxiv.org/abs/2210.08698
<2022-2023 Schedule>
GOV 3009 Website:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Calendar:
https://calendar.google.com/calendar/embed?src=c_3v93pav9fjkkldrbu9snbhned8…
Best,
Shusei