gov3009-l

gov3009-l@lists.fas.harvard.edu

717 discussions

GOV 3009 (Applied Stats Workshop), 10/25 -- Melody Huang

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on October 25 (12:00 EST). Melody Huang presents "Towards Credible Causal Inference under Real-World Complications: Sensitivity Analysis for Generalizability" <When> October 25, 12:00 to 1:30 PM, EST Lunch will be available for pick-up inside CGIS K354. <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Randomized controlled trials (RCT’s) allow researchers to estimate causal effects in an experimental sample with minimal identifying assumptions. However, to generalize or transport a causal effect from an RCT to a target population, researchers must adjust for a set of treatment effect moderators. In practice, it is impossible to know whether the set of moderators has been properly accounted for. In the following talk, I propose a two parameter sensitivity analysis for generalizing or transporting experimental results using weighted estimators. The contributions in the paper are two-fold. First, I show that the sensitivity parameters are scale-invariant and standardized. Unlike existing sensitivity analyses in external validity, the proposed framework allows researchers to simultaneously account for the bias in their estimates from omitting a moderator, as well as potential changes to their inference. Second, I propose several tools researchers can use to perform sensitivity analysis: (1) graphical and numerical summaries for researchers to assess how robust an estimated effect is to changes in magnitude as well as statistical significance; (2) a formal benchmarking approach for researchers to estimate potential sensitivity parameter values using existing data; and (3) an extreme scenario analysis. While sensitivity tools for routine reporting have been introduced for sensitivity frameworks for outcome modeling approaches, these tools do not yet exist for weighted estimators. Thus, the talk introduces a collection of methods that provide much needed interpretability to sensitivity analyses, and a framework for researchers to transparently and quantitatively argue about the robustness in their estimated effects. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

6 months, 2 weeks

GOV 3009 (Applied Stats Workshop), 10/18 -- Dae Woong Ham

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on October 18 (12:00 EST). Dae Woong Ham presents "Design-Based Confidence Sequences: A General Approach to Risk Mitigation in Online Experimentation." <When> October 18, 12:00 to 1:30 PM, EST Lunch will be available for pick-up inside CGIS K354. <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Randomized experiments have become the standard method for companies to evaluate the performance of new products or services. In addition to augmenting managers' decision-making, experimentation mitigates risk by limiting the proportion of customers exposed to innovation. Since many experiments are on customers arriving sequentially, a potential solution is to allow managers to ``peek'' at the results when new data becomes available and stop the test if the results are statistically significant. Unfortunately, peeking invalidates the statistical guarantees for standard statistical analysis and leads to uncontrolled type-1 error. Our paper provides valid design-based confidence sequences, sequences of confidence intervals with uniform type-1 error guarantees over time for various sequential experiments in an assumption-light manner. In particular, we focus on finite-sample estimands defined on the study participants as a direct measure of the incurred risks by companies. Our proposed confidence sequences are valid for a large class of experiments, including multi-arm bandits, time series, and panel experiments. We further provide a variance reduction technique incorporating modeling assumptions and covariates. Finally, we demonstrate the effectiveness of our proposed approach through a simulation study and three real-world applications from Netflix. Our results show that by using our confidence sequence, harmful experiments could be stopped after only observing a handful of units; for instance, an experiment that Netflix ran on its sign-up page on 30,000 potential customers would have been stopped by our method on the first day before 100 observations. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

6 months, 3 weeks

GOV 3009 (Applied Stats Workshop), 10/11 -- Soichiro Yamauchi

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on October 11 (12:00 EST). Soichiro Yamauchi presents "Statistical Analysis with Machine Learning Predicted Variables." <When> October 11, 12:00 to 1:30 PM, EST Lunch will be available for pick-up inside CGIS K354. <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Scholars in the social sciences are increasingly relying on machine learning (ML) techniques to construct data from large corpora of text and images. The ML-generated variables are subsequently utilized in statistical analysis to address substantive questions through regression and hypothesis testing. However, this approach can introduce substantial bias and lead to incorrect inferences due to prediction errors during the machine learning stage. In this paper, we present an approach that incorporates ML-generated variables into regression analysis while ensuring consistency and asymptotic normality. The proposed approach leverages a small-scale human-coded sample to capture the bias in the naive estimator, without the need for strict assumptions about the structure of prediction errors. Furthermore, we have developed diagnostic tools to assess whether additional human coding can further reduce variance in the main analysis. We illustrate the effectiveness of our method by revisiting a study on the sources of election fraud with ballot image data and regression analysis. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

7 months

GOV 3009 (Applied Stats Workshop), 10/4 -- Michael Lingzhi Li

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on October 4 (12:00 EST). Michael Lingzhi Li presents "Statistical Performance Guarantee for Selecting Those Predicted to Benefit Most from Treatment." <When> October 4, 12:00 to 1:30 PM, EST Lunch will be available for pick-up inside CGIS K354. <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Across a wide array of disciplines, many researchers use modern machine learning algorithms to identify a subgroup of individuals, called exceptional responders, who are likely to be helped by a treatment the most. A common approach is to first estimate the conditional average treatment effect (CATE) or its proxy given a set of pre-treatment covariates and then optimize a cutoff of the resulting treatment prioritization score to prioritize who should receive the treatment. Unfortunately, since these estimated scores are often biased and noisy in practice, naive reliance on them can lead to misleading inference. Furthermore, practitioners often utilize the same set of data to optimize the cutoff and evaluate the performance of the resulting subset, causing a multiple testing problem. In this paper, we propose a methodology that has a uniform statistical performance guarantee for selecting such exceptional responders regardless of the cutoff optimization. Specifically, we develop a uniform confidence interval for experimentally evaluating the group average treatment effect (GATE) among the individuals whose estimated score is at least as high as any given quantile value. This uniform confidence interval enables researchers to utilize arbitrary methods to choose the quantile of estimated score, including optimizing over the lower confidence bound of the estimated GATE among the selected individuals. The proposed methodology provides this statistical performance guarantee without suffering from multiple testing problems, and also generalizes to a generic class of statistics beyond GATE. Importantly, the validity of our methodology depends solely on randomization of treatment and random sampling of units and does not require modeling assumptions or resampling methods. Consequently, our methodology is applicable to any machine learning algorithm and is computationally efficient. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

7 months, 1 week

GOV 3009 (Applied Stats Workshop), 9/27 -- Tyler Simko

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on September 27 (12:00 EST). Tyler Simko presents "Title: School Desegregation by Redrawing District Boundaries." <When> September 27, 12:00 to 1:30 PM, EST Lunch will be available for pick-up inside CGIS K354. <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Schools in the United States remain heavily segregated by race and income. Previous work demonstrates districts can reduce segregation between their schools with policies like redrawing attendance zones. Yet, the promise of such policies in many areas is limited by the fact that most school segregation occurs between school districts, and not between schools in the same district. I adapt Markov Chain Monte Carlo (MCMC) algorithms from political redistricting methodology to redraw school district boundaries that decrease segregation while maintaining desirable criteria like distance to school and using only existing school facilities. Focusing on New Jersey, where the segregation of Black and Hispanic students from White and Asian students is among the worst in the country, I demonstrate that redrawing school districts could reduce nearly 40% of existing segregation in the median New Jersey county, compared to less than 5% for redrawing attendance zones alone. Finally, I show how my proposed methodology can be applied to as few as two districts to reduce segregation in proposed “mergers,” a consolidation of small districts into one large district. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

7 months, 2 weeks

GOV 3009 (Applied Stats Workshop), 9/20 -- Larry Han

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on September 20 (12:00 EST). Larry Han presents "Promises and Perils of Multiply Robust Federated and Transfer Learning to Estimate Causal Effects." <When> September 20, 12:00 to 1:30 PM, EST Lunch will be available for pick-up at 11:45 (CGIS K354). <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Abstract: Federated or multi-site studies have distinct advantages over single-site studies, including increased generalizability, the ability to study underrepresented populations, and the opportunity to study rare exposures and outcomes. However, these studies are challenging due to the need to preserve the privacy of each individual's data and the heterogeneity in their covariate distributions. We propose a novel federated approach to derive valid causal inferences for a target population using multi-site data. We adjust for covariate shift and covariate mismatch between sites by developing multiply-robust and privacy-preserving nuisance function estimation. Our methodology incorporates transfer learning to estimate ensemble weights to combine information from source sites. We show that these learned weights are efficient and optimal under different scenarios. We showcase the finite sample advantages of our approach in terms of efficiency and robustness compared to existing approaches. Finally, we showcase the utility of our methodology for estimating COVID-19 vaccine efficacy (Moderna vs. Pfizer) across geographic regions, and variations in congenital heart surgery quality across racial/ethnic groups. Our findings have implications for the efficient allocation of scarce resources. Paper 1: https://arxiv.org/abs/2112.09313 Paper 2: https://arxiv.org/abs/2203.00768 Paper 3: In progress (will update with the Arxiv link soon) <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

7 months, 3 weeks

GOV 3009 (Applied Stats Workshop), 9/13 -- Davide Viviano

by Li, Jialu

Dear Applied Statistics Workshop Community, Our next meeting will be on September 13 (12:00 EST). Davide Viviano presents "Policy Targeting under Network Interference." <When> September 13, 12:00 to 1:30 PM, EST Lunch will be available for pick-up at 11:30 (CGIS K354). <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Abstract: This paper studies the problem of optimally allocating treatments in the presence of spillover effects, using information from a (quasi-)experiment. I introduce a method that maximizes the sample analog of average social welfare when spillovers occur. I construct semi-parametric welfare estimators with known and unknown propensity scores and cast the optimization problem into a mixed-integer linear program, which can be solved using off-the-shelf algorithms. I derive a strong set of guarantees on regret, i.e., the difference between the maximum attainable welfare and the welfare evaluated at the estimated policy. The proposed method presents attractive features for applications: (i) it does not require network information of the target population; (ii) it exploits heterogeneity in treatment effects for targeting individuals; (iii) it does not rely on the correct specification of a particular structural model; and (iv) it accommodates constraints on the policy function. An application for targeting information on social networks illustrates the advantages of the method. Paper: https://dviviano.github.io/projects/main_text_NEWM_Jan2023.pdf <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

8 months

GOV 3009 (Applied Stats Workshop), 9/6 -- Keyon Vafa

by Li, Jialu

Dear Applied Statistics Workshop Community, Welcome back! Our first meeting of the semester will be on September 6 (12:00 EST). Keyon Vafa presents "Decomposing Changes in the Gender Wage Gap over Worker Careers." <When> September 6, 12:00 to 1:30 PM, EST Lunch will be available for pick-up at 11:30 (CGIS K354). <Where> In-person: CGIS K354 Zoom: https://harvard.zoom.us/j/93217566507?pwd=elBwYjRJcWhlVE5teE1VNDZoUXdjQT09 <Abstract> Abstract: A large literature in labor economics seeks to decompose gender wage gaps into different sources, including portions explained by cross-gender differences in education and occupation. While career histories contain valuable information about sources of gender wage disparities, they are too high-dimensional to include in standard econometric techniques. This talk presents new machine learning methods for decomposing gender wage gaps over worker careers. We develop a "foundation model" of career trajectories to summarize worker histories with low-dimensional representations. We show how to fine-tune the foundation model on small survey datasets while ensuring that the representations do not omit features of history whose exclusion would bias decompositions. On data from the Panel Study of Income Dynamics, our method explains more of the gender wage gap than standard techniques. Finally, we propose a new decomposition of the change in gender wage gaps over workers careers into two sources: gender differences in initial characteristics and gender differences in worker transitions. Using representations from the foundation model, we show that early in careers, the gender wage gap widens, driven by males transitioning to higher-paying characteristics than females; meanwhile, later in careers, the gender wage gap narrows, driven by female initial characteristics setting up workers for more wage growth than those of males. <2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/u/0?cid=Y18zdjkzcGF2OWZqa2tsZHJidTlzbm… Best, Jialu -- Jialu Li Department of Government Harvard University jialu_li(a)g.harvard.edu

8 months, 1 week

Looking for workshop coordinators and committee members

by Shusei Eshima

To the Harvard Gov participants, The Government department is seeking graduate student volunteers to fill committee and workshop coordinator roles for the next academic year. If you're interested in serving on a departmental committee or as a workshop coordinator next year, please indicate your interest on this form ( https://forms.gle/aJBvpucEkwffH4E1A) by the coming Tuesday, August 1. As you might guess, this is a relatively time-sensitive issue since speaker series and department committees need to schedule their activities in advance. Best, Shusei

9 months, 2 weeks

GOV 3009 (Applied Stats Workshop), 4/26 Dean Knox and Guilherme Duarte

by Shusei Eshima

Dear Applied Statistics Workshop Community, Our next meeting of the semester will be on April 26 (12:00 EST). Dean Knox and Guilherme Duarte will present "Optimal Allocation of Data-Collection Resources." <Where> CGIS K354 Bagged lunches are available for pick-up at 11:45 (CGIS K354). Zoom: https://harvard.zoom.us/j/99181972207?pwd=Ykd3ZzVZRnZCSDZqNVpCSURCNnVvQT09 <Abstract> Complications in applied work often prevent researchers from obtaining unique point estimates of target quantities using cheaply available data—at best, ranges of possibilities, or sharp bounds, can be reported. To make progress, researchers frequently collect more information by (1) re-cleaning existing datasets, (2) gathering secondary datasets, or (3) pursuing entirely new designs. Common examples include manually correcting missingness, recontacting attrited units, validating proxies with ground-truth data, finding new instrumental variables, and conducting follow-up experiments. These auxiliary tasks are costly, forcing tradeoffs with (4) larger samples from the original approach. Researchers' data-collection strategies, or choices over these tasks, are often based on convenience or intuition. In this work, we show how to provably identify the most cost-efficient data-collection strategy for a given research problem. We quantify the quality of existing data using the width of the confidence regions on the sharp bounds, which captures two sources of uncertainty: statistical uncertainty due to finite samples of the variables measured, and fundamental uncertainty because some variables are not measured at all. We then show how to compute the expected information gain, defined as the expected amount by which each data-collection task will narrow these bounds by addressing one or both sources of uncertainty. Finally, we select the task with the greatest information efficiency, or gain per unit cost. Leveraging recent advances in automatic bounding (Duarte et al., 2022), we prove efficiency is computable for essentially any discrete causal system, estimand, and auxiliary data task. Based on this theoretical framework, we develop a method for optimal adaptive allocation of data-collection resources. Users first input a causal graph, estimand, and past data. They then enumerate distributions from which future samples can be drawn, fixed and per-sample costs, and any prior beliefs. Our method automatically derives and sequentially updates the optimal data-collection strategy. <2022-2023 Schedule> GOV 3009 Website: https://projects.iq.harvard.edu/applied.stats.workshop-gov3009 Calendar: https://calendar.google.com/calendar/embed?src=c_3v93pav9fjkkldrbu9snbhned8… Best, Shusei

1 year

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

gov3009-l