gov3009-l October 2023

gov3009-l@lists.fas.harvard.edu

1 participants
5 discussions

GOV 3009 (Applied Stats Workshop), 11/1 -- Naijia Liu

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on November 1 (12:00 EST). Naijia Liu presents "Synthetic Control Method with Pre-treatment Outcomes Missing" (Joint work with Sooahn Shin and Soichiro Yamauchi). <When> November 1, 12:00 to 1:30 PM, EST Lunch will be available for pick-up inside CGIS K354. <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> The synthetic control method (SCM) is commonly used in social science research to estimate treatment effects. It involves creating a synthesized control unit for the treated unit in observational studies. The quality of this synthesized control unit is influenced by factors like the number of pretreatment periods and missing values. Many empirical datasets, particularly those with a panel structure, often encounter issues with missing values. This project studies the impact of missing values on SCM and provides theoretical guidance to the potential bias. We formulate SCM with missing data in a vertical regression perspective. Under such setting, missing values can be deemed as omitted variables. We show that the bias of the ATT is decomposed into (1) weight of the missing unit for constructing the synthetic control and (2) the imbalance between the missing units and the weighted observed donor units. Building on these result, We propose a sensitivity analysis for SCM with pretreatment outcomes missing not at random. To illustrate the method in practice, we revisit a previous study that examines the impact of Taiwan's expulsion from the International Monetary Fund (IMF) in 1980 on its precautionary international reserves using the SCM. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

5 months, 4 weeks

GOV 3009 (Applied Stats Workshop), 10/25 -- Melody Huang

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on October 25 (12:00 EST). Melody Huang presents "Towards Credible Causal Inference under Real-World Complications: Sensitivity Analysis for Generalizability" <When> October 25, 12:00 to 1:30 PM, EST Lunch will be available for pick-up inside CGIS K354. <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Randomized controlled trials (RCT’s) allow researchers to estimate causal effects in an experimental sample with minimal identifying assumptions. However, to generalize or transport a causal effect from an RCT to a target population, researchers must adjust for a set of treatment effect moderators. In practice, it is impossible to know whether the set of moderators has been properly accounted for. In the following talk, I propose a two parameter sensitivity analysis for generalizing or transporting experimental results using weighted estimators. The contributions in the paper are two-fold. First, I show that the sensitivity parameters are scale-invariant and standardized. Unlike existing sensitivity analyses in external validity, the proposed framework allows researchers to simultaneously account for the bias in their estimates from omitting a moderator, as well as potential changes to their inference. Second, I propose several tools researchers can use to perform sensitivity analysis: (1) graphical and numerical summaries for researchers to assess how robust an estimated effect is to changes in magnitude as well as statistical significance; (2) a formal benchmarking approach for researchers to estimate potential sensitivity parameter values using existing data; and (3) an extreme scenario analysis. While sensitivity tools for routine reporting have been introduced for sensitivity frameworks for outcome modeling approaches, these tools do not yet exist for weighted estimators. Thus, the talk introduces a collection of methods that provide much needed interpretability to sensitivity analyses, and a framework for researchers to transparently and quantitatively argue about the robustness in their estimated effects. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

6 months

GOV 3009 (Applied Stats Workshop), 10/18 -- Dae Woong Ham

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on October 18 (12:00 EST). Dae Woong Ham presents "Design-Based Confidence Sequences: A General Approach to Risk Mitigation in Online Experimentation." <When> October 18, 12:00 to 1:30 PM, EST Lunch will be available for pick-up inside CGIS K354. <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Randomized experiments have become the standard method for companies to evaluate the performance of new products or services. In addition to augmenting managers' decision-making, experimentation mitigates risk by limiting the proportion of customers exposed to innovation. Since many experiments are on customers arriving sequentially, a potential solution is to allow managers to ``peek'' at the results when new data becomes available and stop the test if the results are statistically significant. Unfortunately, peeking invalidates the statistical guarantees for standard statistical analysis and leads to uncontrolled type-1 error. Our paper provides valid design-based confidence sequences, sequences of confidence intervals with uniform type-1 error guarantees over time for various sequential experiments in an assumption-light manner. In particular, we focus on finite-sample estimands defined on the study participants as a direct measure of the incurred risks by companies. Our proposed confidence sequences are valid for a large class of experiments, including multi-arm bandits, time series, and panel experiments. We further provide a variance reduction technique incorporating modeling assumptions and covariates. Finally, we demonstrate the effectiveness of our proposed approach through a simulation study and three real-world applications from Netflix. Our results show that by using our confidence sequence, harmful experiments could be stopped after only observing a handful of units; for instance, an experiment that Netflix ran on its sign-up page on 30,000 potential customers would have been stopped by our method on the first day before 100 observations. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

6 months, 1 week

GOV 3009 (Applied Stats Workshop), 10/11 -- Soichiro Yamauchi

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on October 11 (12:00 EST). Soichiro Yamauchi presents "Statistical Analysis with Machine Learning Predicted Variables." <When> October 11, 12:00 to 1:30 PM, EST Lunch will be available for pick-up inside CGIS K354. <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Scholars in the social sciences are increasingly relying on machine learning (ML) techniques to construct data from large corpora of text and images. The ML-generated variables are subsequently utilized in statistical analysis to address substantive questions through regression and hypothesis testing. However, this approach can introduce substantial bias and lead to incorrect inferences due to prediction errors during the machine learning stage. In this paper, we present an approach that incorporates ML-generated variables into regression analysis while ensuring consistency and asymptotic normality. The proposed approach leverages a small-scale human-coded sample to capture the bias in the naive estimator, without the need for strict assumptions about the structure of prediction errors. Furthermore, we have developed diagnostic tools to assess whether additional human coding can further reduce variance in the main analysis. We illustrate the effectiveness of our method by revisiting a study on the sources of election fraud with ballot image data and regression analysis. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

6 months, 2 weeks

GOV 3009 (Applied Stats Workshop), 10/4 -- Michael Lingzhi Li

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on October 4 (12:00 EST). Michael Lingzhi Li presents "Statistical Performance Guarantee for Selecting Those Predicted to Benefit Most from Treatment." <When> October 4, 12:00 to 1:30 PM, EST Lunch will be available for pick-up inside CGIS K354. <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Across a wide array of disciplines, many researchers use modern machine learning algorithms to identify a subgroup of individuals, called exceptional responders, who are likely to be helped by a treatment the most. A common approach is to first estimate the conditional average treatment effect (CATE) or its proxy given a set of pre-treatment covariates and then optimize a cutoff of the resulting treatment prioritization score to prioritize who should receive the treatment. Unfortunately, since these estimated scores are often biased and noisy in practice, naive reliance on them can lead to misleading inference. Furthermore, practitioners often utilize the same set of data to optimize the cutoff and evaluate the performance of the resulting subset, causing a multiple testing problem. In this paper, we propose a methodology that has a uniform statistical performance guarantee for selecting such exceptional responders regardless of the cutoff optimization. Specifically, we develop a uniform confidence interval for experimentally evaluating the group average treatment effect (GATE) among the individuals whose estimated score is at least as high as any given quantile value. This uniform confidence interval enables researchers to utilize arbitrary methods to choose the quantile of estimated score, including optimizing over the lower confidence bound of the estimated GATE among the selected individuals. The proposed methodology provides this statistical performance guarantee without suffering from multiple testing problems, and also generalizes to a generic class of statistics beyond GATE. Importantly, the validity of our methodology depends solely on randomization of treatment and random sampling of units and does not require modeling assumptions or resampling methods. Consequently, our methodology is applicable to any machine learning algorithm and is computationally efficient. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

6 months, 3 weeks

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

gov3009-l October 2023