[gov3009-l] Applied Statistics 3/29/17 - Kosuke Imai

Ban, Pamela pban at fas.harvard.edu
Mon Mar 27 09:26:22 EDT 2017


Hi all,

This week at the Applied Statistics workshop we will be welcoming Kosuke Imai, a Professor in the Department of Politics and Center for Statistics and Machine Learning at Princeton University.  He will be presenting work entitled "Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records."  Please find the abstract below and on the website.

We will meet in CGIS Knafel Room 354 at noon and lunch will be provided.

Best,
Pam

Title: Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records

Abstract:
Since most social science research relies upon multiple data
  sources, merging data sets is an essential part of workflow for many
  researchers.  In many situations, however, a unique identifier that
  unambiguously links data sets is unavailable and data sets may
  contain missing and inaccurate information.  As a result,
  researchers can no longer combine data sets ``by hand'' without
  sacrificing the quality of the resulting merged data set.  This
  problem is especially severe when merging large-scale administrative
  records such as voter files. The existing algorithms to automate the
  merging process do not scale, result in many fewer matches, and
  require arbitrary decisions by researchers.  To overcome this
  challenge, we develop a fast algorithm to implement the canonical
  probabilistic model of record linkage for merging large data sets.
  Researchers can combine this model with a small amount of human
  coding to produce a high-quality merged data set.  The proposed
  methodology can handle millions of observations and account for
  missing data and auxiliary information.  We conduct simulation
  studies to show that our algorithm performs well in a variety of
  practically relevant settings.  Finally, we use our methodology to
  merge the campaign contribution data (5 million records), the
  Cooperative Congressional Election Study data (50 thousand records),
  and the nationwide voter file (160 million records).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.fas.harvard.edu/pipermail/gov3009-l/attachments/20170327/5e9f863b/attachment.html 


More information about the gov3009-l mailing list