Hi All!
Our speaker this Wednesday (4/23) will be Nick Beauchamp from Northeastern University.
Nick will be giving a talk entitled Predicting, Extrapolating and Interpolating
State-level Polls using Twitter. The abstract is included below.
As usual, we will meet in CGIS K354 at 12 noon and lunch will be served.
Looking forward to seeing you all there!
Tess
-----------------
Tess Wise
PhD Candidate
Harvard Department of Government
http://tesswise.com
ABSTRACT:
Predicting, Extrapolating and Interpolating State-level Polls using Twitter
Presidential, gubernatorial, and senatorial elections all require state-level polling, but
even during presidential campaigns, state-level surveys remain sparse, erratically timed,
and entirely neglected in uncompetitive states. Partly in response to these unmet needs in
political and other domains, there have been numerous efforts to approximate various
survey measures using social media data, but most of these approaches remain distinctly
flawed, both methodologically and due to insufficient training data. To remedy these
flaws, this paper combines 1200 state-level polls during the 2012 presidential campaign
with over 100 million state-located political Tweets; models the former as a function of
the latter using a new linear regularization feature-selection method; and shows via
forward-in-time rolling-window out-of-sample testing that, properly modeled, the Twitter
textual data tracks polling variation both across states and within states over time,
predicting short-term changes in polls with greater accuracy than is possible using past
polling data alone. Thus validated, these measures can be extended to unpolled states and,
given the density of the Twitter data, potentially to sub-state regions and sub-day
timescales. In addition, an examination of the textual features most strongly associated
with changes in surveyed vote intention reveals the topics, events, and concerns
associated with the rapidly shifting national debate, making this not just a measurement
tool, but also of potential use for real-time campaign strategy.