Gov 3009 Applied Stats Workshop (10/16) - Pedro Rodriguez - gov3009-l

14 Oct 2019

Dear Workshop Community, 

Our next meeting will be Wednesday October 16, where Pedro Rodriguez will present
co-authored work with Arthur Spirling on "Word Embeddings: What works, what doesn’t,
and how to tell the difference for applied research". 

Abstract: We consider the properties and performance of word embeddings techniques in the
context of political science research. In particular, we explore key parameter
choices—including context window length, embedding vector dimensions and the use of
pre-trained vs locally fit variants—in terms of effects on the efficiency and quality of
inferences possible with these models. Reassuringly, with caveats, we show that results
are robust to such choices for political corpora of various sizes and in various
languages. Beyond reporting extensive technical findings, we provide a novel crowd-sourced
“Turing test”-style method for examining the relative performance of any two models that
produce substantive, text-based outputs. Encouragingly, we show that popular, easily
available pre-trained embeddings perform at a level close to---or surpassing---both human
coders and more complicated locally-fit models. For completeness, we provide best practice
advice for cases where local fitting is required.

Where: CGIS Knafel Building, Room K354 (see this link
<https://map.harvard.edu/?bld=04471&level=9> for directions). 

When: Wednesday, October 16 at 12 noon - 1:30 pm. 

All are welcome. Lunch will be provided. 

Best, 
Georgie