gov3009-l March 2018

gov3009-l@lists.fas.harvard.edu

1 participants
3 discussions

by Dana Higgins

Hi everyone! This week at the Applied Statistics Workshop we will be welcoming *Michelle Torres*, a graduate student in Political Science and Statistics at Washington University in St Louis. She will be presenting work entitled *Understanding visual messages: visual framing and the Bag of Visual Words*. Please find the abstract below and on the Applied Stats website here <https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>. As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be provided. See you all there! -- Dana Higgins *Title:* * Understanding visual messages: visual framing and the Bag of Visual Words * *Abstract:* How should one perform matching in observational studies when the units are text documents? The lack of randomized assignment of documents into treatment and control groups may lead to systematic differences between groups on high-dimensional and latent features of text such as topical content and sentiment. Standard balance metrics, used to measure the quality of a matching method, fail in this setting. We present a framework for matching documents that decomposes matching methods into two parts: (1) a text representation, and (2) a distance metric. We consider various methods that can be used at each step and conduct a systematic multifactor evaluation experiment using human subjects to identify the methods that dominate. We also show that our framework can be used to produce matches with higher subjective match quality than current state-of-the-art techniques. We then apply our chosen method to a substantive debate in the study of media bias using a novel data set of front page news articles from thirteen news sources. Media bias is composed of topic selection bias and presentation bias; using our matching method to control for topic selection, we find that both components contribute significantly to media bias, though some news sources rely on one component more than the other.

6 years, 1 month

Applied Statistics 3/21

by Dana Higgins

Hi everyone! This week at the Applied Statistics Workshop we will be welcoming *Luke Miratrix*, Professor of Education at Harvard University. He will be presenting work entitled *Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality*. Please find the abstract below and on the Applied Stats website here <https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>. As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be provided. See you all there! -- Dana Higgins *Title:* *Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality* *Abstract:* How should one perform matching in observational studies when the units are text documents? The lack of randomized assignment of documents into treatment and control groups may lead to systematic differences between groups on high-dimensional and latent features of text such as topical content and sentiment. Standard balance metrics, used to measure the quality of a matching method, fail in this setting. We present a framework for matching documents that decomposes matching methods into two parts: (1) a text representation, and (2) a distance metric. We consider various methods that can be used at each step and conduct a systematic multifactor evaluation experiment using human subjects to identify the methods that dominate. We also show that our framework can be used to produce matches with higher subjective match quality than current state-of-the-art techniques. We then apply our chosen method to a substantive debate in the study of media bias using a novel data set of front page news articles from thirteen news sources. Media bias is composed of topic selection bias and presentation bias; using our matching method to control for topic selection, we find that both components contribute significantly to media bias, though some news sources rely on one component more than the other.

6 years, 1 month

Applied Stats 3/7

by Dana Higgins

Hi everyone! This week at the Applied Statistics Workshop we will be welcoming *Tianxiao Shen*, a graduate student at MIT. She will be presenting work entitled* Language Style Transfer*. Please find the abstract below and on the Applied Stats website here <https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>. As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be provided. See you all there! -- Dana Higgins *Title:* *Language Style Transfer* *Abstract:* Recent advances in text generation tasks such as machine translation and summarization rely on the use of massive amounts of parallel data, which is costly to collect or nonexistent in many scenarios. In this talk, I will present a novel model to perform style transfer on the basis of non-parallel text. This is an instance of a broad family of problems including machine translation, decipherment, and sentiment modification. I will talk about how we deal with the challenge to disentangle content from style, as well as the techniques we use for adversarial training over discrete samples. I will conclude with the experiments we design which allow qualitative and quantitative evaluation of the effectiveness of our method.

6 years, 1 month

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

gov3009-l March 2018