gov3009-l March 2017

gov3009-l@lists.fas.harvard.edu

1 participants
3 discussions

Applied Statistics 3/29/17 - Kosuke Imai

by Ban, Pamela

Hi all, This week at the Applied Statistics workshop we will be welcoming Kosuke Imai, a Professor in the Department of Politics and Center for Statistics and Machine Learning at Princeton University. He will be presenting work entitled "Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records." Please find the abstract below and on the website. We will meet in CGIS Knafel Room 354 at noon and lunch will be provided. Best, Pam Title: Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records Abstract: Since most social science research relies upon multiple data sources, merging data sets is an essential part of workflow for many researchers. In many situations, however, a unique identifier that unambiguously links data sets is unavailable and data sets may contain missing and inaccurate information. As a result, researchers can no longer combine data sets ``by hand'' without sacrificing the quality of the resulting merged data set. This problem is especially severe when merging large-scale administrative records such as voter files. The existing algorithms to automate the merging process do not scale, result in many fewer matches, and require arbitrary decisions by researchers. To overcome this challenge, we develop a fast algorithm to implement the canonical probabilistic model of record linkage for merging large data sets. Researchers can combine this model with a small amount of human coding to produce a high-quality merged data set. The proposed methodology can handle millions of observations and account for missing data and auxiliary information. We conduct simulation studies to show that our algorithm performs well in a variety of practically relevant settings. Finally, we use our methodology to merge the campaign contribution data (5 million records), the Cooperative Congressional Election Study data (50 thousand records), and the nationwide voter file (160 million records).

7 years, 1 month

Applied Statistics 3/22/17 - Elizabeth Stuart

by Ban, Pamela

Hi all, This week at the Applied Statistics workshop we will be welcoming Elizabeth Stuart, a Professor and Associate Dean for Education at Johns Hopkins School of Public Health. She will be presenting work entitled "Estimating population effects: Assessing and enhancing the generalizability of randomized trials to target populations." Please find the abstract below and on the website. We will meet in CGIS Knafel Room 354 at noon and lunch will be provided. Best, Pam Title: Estimating population effects: Assessing and enhancing the generalizability of randomized trials to target populations Abstract: With increasing attention being paid to the relevance of studies for real-world practice (such as in education, international development, and comparative effectiveness research), there is also growing interest in external validity and assessing whether the results seen in randomized trials would hold in target populations. While randomized trials yield unbiased estimates of the effects of interventions in the sample of individuals (or physician practices or schools) in the trial, they do not necessarily inform about what the effects would be in some other, potentially somewhat different, population. While there has been increasing discussion of this limitation of traditional trials, relatively little statistical work has been done developing methods to assess or enhance the external validity of randomized trial results. This talk will first provide empirical data on the potential size of external validity bias in education research. It will then discuss design and analysis methods for assessing and increasing external validity, as well as general issues that need to be considered when thinking about external validity. The primary analysis approach discussed will be a reweighting approach that equates the sample and target population on a set of observed characteristics. Underlying assumptions, performance in simulations, and limitations will be discussed. Implications for how future studies should be designed (and what data needs to be collected) in order to enhance the ability to assess generalizability will also be discussed.

7 years, 1 month

Applied Statistics 3/8/2017 - Paramveer Dhillon

by Ban, Pamela

Hi all, This week at the Applied Statistics workshop we will be welcoming Paramveer Dhillon, a Postdoctoral Fellow at the MIT Sloan School of Management and the Initiative on Digital Economy at MIT. He will be presenting work entitled "Linear Methods for Big Data." Please find the abstract below and on the website. We will meet in CGIS Knafel Room 354 at noon and lunch will be provided. Best, Pam Title: Linear Methods for Big Data Abstract: Statistical machine learning has seen great advances in the last decade owing to the availability of large-scale annotated datasets and significant improvements in computation hardware. Amidst this measurement revolution, it has become increasingly important to come up with statistical methods that are not only statistically efficient but that are also computationally efficient i.e. they run fast. Drawing on these developments and recent advances in random matrix theory, I will present my work on building fast and theoretically sound methods for linear regression (OLS) and canonical correlation analysis (CCA). I will also describe how these methods can be used to generate linear features that give a state-of-the-art performance on several natural language processing tasks.

7 years, 1 month

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

gov3009-l March 2017