Hi Everyone!
It's hard to believe that next week will be our final meeting for the semester. I have thoroughly enjoyed getting to spend my Wednesday lunches with all of you. Our final speaker will be Nathan Kallus who is a PhD student in Operations Research at MIT. Nathan will be presenting some very exciting research on Regression-Robust Designs of Controlled Experiments. The abstract and a link to the paper is included below.
As usual, we will meet in CGIS K354 at 12 noon. There will be some sort of food -- we only have $140 left in the budget so I will have to be creative. Maybe I will cook something......no promises!
Tess
Abstract:
Achieving balance between experimental groups is a cornerstone of causal inference. Without balance any observed difference may be attributed to a difference other than the treatment alone. In controlled/clinical trials, where the experimenter controls the administration of treatment, complete randomization of subjects has been the golden standard for achieving this balance because it allows for unbiased and consistent estimation and inference in the absence of any a priori knowledge or measurements. However, since estimator variance under complete randomization may be slow to converge, experimental designs that balance pre-treatment measurements (baseline covariates) are in pervasive use, including randomized block designs, pairwise-matched designs, and re-randomization. We formally argue that absolutely no balance better than complete randomization's can be achieved without partial structural knowledge about the treatment effects. Therefore, that balancing designs are in popular use, are advocated, and have been proven in practice means that some structural knowledge is in fact available to the researcher. We propose a novel framework for formulating such knowledge using functional analysis. It subsumes all of the aforementioned designs in that it recovers them as optimal under different choices of structure, thus theoretically characterizing their underlying motivations and comparative power under different assumptions and providing extensions of these to multi-arm trials. Furthermore, it suggests new optimal designs that are based on more robust nonparametric modeling and that offer extensive gains in precision and power. In certain cases we are able to argue linear convergence 1/2^O(-n) to the sample average treatment effect (as compared to the usual logarithmic convergence O(1/sqrt(n))). We theoretically characterize the unbiasedness, variance, and consistency of any estimator arising from our framework; solve the design problem using modern optimization techniques; and develop appropriate inferential algorithms to test differences in treatments. We uncover connections to Bayesian experimental design and make extensions to dealing with non-compliance.
Pre-print available at:
http://arxiv.org/abs/1312.0531
-----------------
Tess Wise
PhD Candidate
Harvard Department of Government
http://tesswise.com
Hi All!
Our speaker this Wednesday (4/23) will be Nick Beauchamp from Northeastern University. Nick will be giving a talk entitled Predicting, Extrapolating and Interpolating State-level Polls using Twitter. The abstract is included below.
As usual, we will meet in CGIS K354 at 12 noon and lunch will be served.
Looking forward to seeing you all there!
Tess
-----------------
Tess Wise
PhD Candidate
Harvard Department of Government
http://tesswise.com
ABSTRACT:
Predicting, Extrapolating and Interpolating State-level Polls using Twitter
Presidential, gubernatorial, and senatorial elections all require state-level polling, but even during presidential campaigns, state-level surveys remain sparse, erratically timed, and entirely neglected in uncompetitive states. Partly in response to these unmet needs in political and other domains, there have been numerous efforts to approximate various survey measures using social media data, but most of these approaches remain distinctly flawed, both methodologically and due to insufficient training data. To remedy these flaws, this paper combines 1200 state-level polls during the 2012 presidential campaign with over 100 million state-located political Tweets; models the former as a function of the latter using a new linear regularization feature-selection method; and shows via forward-in-time rolling-window out-of-sample testing that, properly modeled, the Twitter textual data tracks polling variation both across states and within states over time, predicting short-term changes in polls with greater accuracy than is possible using past polling data alone. Thus validated, these measures can be extended to unpolled states and, given the density of the Twitter data, potentially to sub-state regions and sub-day timescales. In addition, an examination of the textual features most strongly associated with changes in surveyed vote intention reveals the topics, events, and concerns associated with the rapidly shifting national debate, making this not just a measurement tool, but also of potential use for real-time campaign strategy.
Hi All!
Professor Winship asked that I distribute this as it might be of interest to many people on this list:
2014 Causal Inference Workshops: Main and Advanced
[please recirculate to others who might be interested]
Northwestern University and Duke University are holding two workshops on Research Design for Causal Inference this year. We invite you to attend either or both. Apologies for the length of this message, which covers both.
Main workshop: Monday – Friday, July 7-11, 2014 [at Northwestern]
Advanced workshop: Wednesday - Friday, August 13-15, 2014 [at Duke]
Both workshops will be taught by world-class causal inference researchers. See below for details. Registration for each is limited to 100 participants. We filled the main workshop quickly last year, so please register soon.
For information and to register: law.northwestern.edu/faculty/conferences/causalinference/<https://urldefense.proofpoint.com/v1/url?u=http://www.law.northwestern.edu/…>
Bernie Black [Northwestern, Law School and Kellogg School of Management]
Mat McCubbins [Duke, Political Science and Law]
Main Workshop Overview: Research design for causal inference is at the heart of a “credibility revolution” in empirical research. We will cover the design of true randomized experiments and contrast them to “natural” or “quasi” experiments and to “pure observational studies,” where part of the sample is “treated” in some way, and the remainder is a control group, but the researcher controls neither the assignment of cases to treatment and control groups nor administration of the treatment. We will assess what causal inferences one can draw from a research design, threats to valid inference, and research designs that can mitigate those threats.
Most empirical methods courses survey a variety of methods. We will begin instead with the goal of causal inference, and discuss how to design research to come closer to that goal. The methods are often adapted to a particular study. Some of the methods are covered in PhD programs, but rarely in depth, and rarely with a focus on causal inference and on which methods to use with messy, real-world datasets and limited sample sizes. Each day will include with a Stata “workshop” to illustrate selected methods with real data and Stata code.
Advanced Workshop Overview: The advanced workshop seeks to provide an in-depth discussion of selected topics that are beyond what we can cover in the main workshop. Principal topics for 2014 include: Day 1: Choosing estimands (the science), and how choice of estimand affects research design. Principal stratification methods (a little known, but very powerful extension of the always taker/never-taker/complier/defier categories developed in “causal IV”); advanced matching methods; multiple imputation of missing potential outcomes. Day 2: Simulation studies; bootstrap methods; advanced topics in regression discontinuity design. Day 3: Causal inference with panel data. Topics will include handling treatment heterogeneity, handling time dynamics, synthetic controls, marginal structural models, and standard errors.
Target audience for Main Workshop: Quantitative empirical researchers (faculty and graduate students) in social science, including law, political science, economics, many business-school areas (finance, accounting, management, marketing, etc), medicine, sociology, education, psychology, etc. – indeed anywhere that causal inference is important.
We will assume knowledge, at the level of an upper-level college econometrics or similar course, of multivariate regression, including OLS, logit, and probit; basic probability and statistics including conditional and compound probabilities, confidence intervals, t-statistics, and standard errors; and some understanding of instrumental variables. Despite its modest prerequisites, this course should be suitable for most researchers with PhD level training and for empirical legal scholars with reasonable but more limited training. Even for recent PhD’s, there will be much that you don’t know, or don’t know as well as you should.
Target Audience for Advanced Workshop. Our target audience is empirical researchers who are reasonably familiar with the basics of causal inference (from our main workshop or otherwise), and want to extend their knowledge. We will assume familiarity with the potential outcomes notation, randomization inference, difference-in-differences, regression discontinuity, panel data, and instrumental variable designs, but will not assume expertise in any of these areas.
Main workshop faculty
Justin McCrary (University of California, Berkeley, Law School)
Justin McCrary is Professor of Law, University of California, Berkeley. Principal research interests: crime and urban problems, law and economics, corporations, employment discrimination, and empirical legal studies. Web page with link to CV: http://www.econ.berkeley.edu/~jmccrary/<https://urldefense.proofpoint.com/v1/url?u=http://www.econ.berkeley.edu/~jm…>.
Alberto Abadie (Harvard University, Kennedy School of Government)
Alberto Abadie is Professor of Public Policy at the Kennedy School of Government at Harvard University. Principal research interests: econometrics; program evaluation. Web page with link to CV: http://www.hks.harvard.edu/fs/aabadie/ . Papers on SSRN: http://ssrn.com/author=198468<https://urldefense.proofpoint.com/v1/url?u=http://ssrn.com/author%3D198468&…>.
Jens Hainmueller (Stanford, Political Science)
Jens Hainmueller is Associate Professor in the Stanford Political Science Department. He also holds a courtesy appointment in the Stanford Graduate School of Business. His research interests include statistical methods, political economy, and political behavior. Web page with link to CV: http://www.stanford.edu/~jhain//<https://urldefense.proofpoint.com/v1/url?u=http://www.stanford.edu/%257Ejha…>
Main workshop outline
Monday-Tuesday July 7-8 (Justin McCrary)
Introduction to Modern Methods for Causal Inference
Overview of causal inference and the Rubin “potential outcomes” causal model. The “gold standard” of a randomized experiment. Treatment and control groups, and the core role of the assignment (to treatment) mechanism. Causal inference as a missing data problem, and imputation of missing potential outcomes.
Instrumental variable and regression discontinuity methods
Causal inference with instrumental variables (IV), including (i) the core, untestable need to satisfy the “only through” exclusion restriction; (ii) heterogeneous treatment effects; and (iii) intent-to-treat designs for randomized trials (or quasi-experiments) with noncompliance.
(Regression) discontinuity (RD) research designs: sharp and fuzzy designs; bandwidth choice; testing for covariate balance and manipulation of the threshold; discontinuities as substitutes for true randomization and sources of convincing instruments.
Wednesday July 9; Thursday morning July 10 (Alberto Abadie)
Observational Studies: Selection on observables
Selection on observables and common support assumptions. Subclassification, matching, and regression estimators of average treatment effects. Propensity score methods: matching and weighting. What to match on: a brief introduction to directed acyclic graphs.
Standard Errors
Robust and clustered standard errors. The bootstrap.
Thursday afternoon, July 10 – Friday morning, July 11 (Jens Hainmueller)
Difference-in-Differences, Panel Data, and Synthetic Controls
Simple two-period DiD; the “parallel changes” assumption. Leads and lags and distributed lag models. Accommodating covariates. Triple differences. Panel data methods. Synthetic controls.
Friday afternoon: Feedback on your own research
Attendees will present their own research design questions from current work in breakout sessions and receive feedback on research design. Session leaders: Bernie Black, Mat McCubbins, Jens Hainmueller. Parallel sessions as needed to meet demand.
_________________________________________________________________________
Advanced Workshop Faculty
Donald B. Rubin (Harvard University, Department of Statistics)
Donald Rubin is John L. Loeb Professor of Statistics, Harvard University. His work on the “Rubin Causal Model” is central to modern understanding of when one can and cannot infer causation from regression. Principal research interests: statistical methods for causal inference; Bayesian statistics; analysis of incomplete data. Web page, with link to CV: www.stat.harvard.edu/faculty_page.php?page=rubin.html<http://www.stat.harvard.edu/faculty_page.php?page=rubin.html>;Wikipedia: http://en.wikipedia.org/wiki/Donald_Rubin<https://urldefense.proofpoint.com/v1/url?u=http://en.wikipedia.org/wiki/Don…>
Jonathan N. Katz (California Institute of Technology)
Jonathan Katz is Kay Sugahara Professor of Social Sciences and Statistics at Caltech. Co-editor: Political Analysis. Principal research interests: American politics, political methodology; formal political theory. Web page with link to CV: http://jkatz.caltech.edu/<https://urldefense.proofpoint.com/v1/url?u=http://jkatz.caltech.edu/&k=AjZj…>.
Justin McCrary (University of California, Berkeley, Law School) [see blurb for main workshop above]
Advanced Workshop Outline
Wednesday August 13 (Don Rubin)
Choosing estimands (the science). Implications of choice of estimand for choice of method. Principal stratification. Flexible matching methods. Multiple imputation of missing potential outcomes. And whatever else Don thinks he should cover, in the allotted time.
Thursday August 14 (Justin McCrary)
Conducting simulation studies. Inference and testing using the bootstrap, including adapting bootstrap methods to your research design. Topics in regression discontinuity design: nonparametric estimation; Local linear regression and density estimation; choosing bandwidth and assessing sensitivity to bandwidth choice.
Friday August 14 (Jonathan Katz)
Topics in causal inference with panel data, including time-series-cross-sectional (TSCS) data. Topics will include issues of unit heterogeneity, specification of dynamics, synthetic matching, and marginal structural models, and which standard errors to use.
Lunch talk: Advice from a journal editor on what to do (and not do) (Jonathan Katz is the editor of Political Methodology).
________________________________________________________________________
Registration and Workshop Cost
Main workshop tuition is $850 ($500 for graduate students (PhD, SJD, or law) and post-docs). Advanced workshop tuition is $550 ($350 for graduate students and post-docs). There are additional discounts (to $350 and $200) for Northwestern or Duke-affiliated attendees. The workshop fees include all materials, temporary Stata13 license, breakfast, lunch, snacks, and an evening reception on the first day of each program. All amounts will increase by $50 roughly two months before the workshop (May 22 for the main workshop, but this workshop is likely to fill up before then). See website for registration deadlines and cancellation policy. We know the workshops are not cheap. We use the funds to pay our speakers and for meals and other expenses; we don’t pay ourselves.
Workshop Organizers
Bernard Black (Northwestern University, Law and Kellogg School of Management)
Bernie Black is Nicholas J. Chabraja Professor at Northwestern University, with positions in the Law School and Kellogg School of Management. Principal research interests: law and finance, international corporate governance, health law and policy; empirical legal studies. Papers on SSRN: http://ssrn.com/author=16042<https://urldefense.proofpoint.com/v1/url?u=http://ssrn.com/author%3D16042&k…>.
Mathew McCubbins (Duke University)
Professor of Political Science and Law at Duke University, with positions in the Law School and the Political Science Department, and director of the Center for Law and Democracy. Principal research interests: democratic institutions, legislative organization; behavioral experiments, communication, learning and decisionmaking; statutory interpretation, administrative procedure, research design; network economics. Web page with link to CV: www.mccubbins.us<https://urldefense.proofpoint.com/v1/url?u=http://www.mccubbins.us&k=AjZjj3…>. Papers on SSRN: http://ssrn.com/author=17402<https://urldefense.proofpoint.com/v1/url?u=http://ssrn.com/author%3D17402&k…>.
Questions about the workshops: Please email Bernie Black (bblack(a)northwestern.edu<mailto:bblack@northwestern.edu>) or Mat McCubbins (mathew.mccubbins(a)duke.edu<mailto:mathew.mccubbins@duke.edu>) for substantive questions or fee waiver requests, and Michael Cooper (causalinference(a)law.northwestern.edu<mailto:causalinference@law.northwestern.edu> for logistics and registration.
Hi All!
Our speaker this Wednesday (4/16) at Applied Stats will be Finale Doshi, a post-doc at Harvard Medical School and the Harvard School of Engineering and Applied Sciences. Finale completed her PhD in Computer Science from MIT in 2012 which applied Bayesian nonparametric models (which have the nice property of scaling the sophistication of learned models with the complexity of the data) to problems in reinforcement learning.
Finale will be giving a talk entitled Prediction and Interpretation with Latent Variable Models. The abstract for the talk is included below. As per usual, we will meet in CGIS K354 at 12 noon and lunch will be served.
I look forward to seeing you all there!
Tess
-----------------
Tess Wise
PhD Candidate
Harvard Department of Government
http://tesswise.com
Prediction and Interpretation with Latent Variable Models
Latent variable models provide a powerful tool for summarizing data through a set of hidden variables. These models are generally trained to maximize prediction accuracy, and modern latent variable models now do an excellent job of finding compact summaries of the data with high predictive power. However, there are many situations in which good predictions alone are not sufficient. Whether the hidden variables have inherent value by providing insights about the data, or whether we simply wish to improve a system, understanding what the discovered hidden variables mean is an important first step.
In this talk, I will discuss one particular model, GraphSparse LDA, for discovering interpretable latent structures without sacrificing (and sometimes improving upon) prediction accuracy. The model incorporates knowledge about the relationships between observed dimensions into a probabilistic framework to find a small set of human-interpretable "concepts" that summarize the observed data. This approach allows us to recover interpretable descriptions of clincially-relevant autism phenotypes from a medical dataset with thousands of dimensions.
Hi Everyone!
Our speaker next Wednesday, April 9th, will be Professor Eleanor Neff Powell from Yale University who will be presenting a talk entitled "Money in Exile: Campaign Contributions and Committee Access" (see abstract and paper below). As per usual, the talk will be held in CGIS K354 at 12 noon and lunch will be served.
I hope to see you all there!
Tess
Abstract:
Corporations and political action committees (PACs) flood congressional elections with money. Understanding why they contribute is essential for determining how money in- fluences policy in Congress. To test theories of contributors’ motivations we exploit committee exile—the involuntary removal of committee members after a party loses a sizable number of seats, and the losses are unevenly distributed across committees. We use exile to show that business interests seek short-term access to influential leg- islators. Sectors regulated by the committee decrease contributions after a legislator is exiled, instead PACs from regulated sectors direct their contributions to new com- mittee members from the opposite party. Partisan interests, in contrast, attempt to influence electoral outcomes—boosting contributions to exiled members. Together, we provide evidence that corporations and business PACs use donations to acquire im- mediate access and favor—suggesting they at least anticipate that the donations will influence policy.
-----------------
Tess Wise
PhD Candidate
Harvard Department of Government
http://tesswise.com