[gov3009-l] Applied Statistics Workshop: Jon Bischof on Wed., Nov. 7
kkashin at fas.harvard.edu
Mon Nov 5 11:35:57 EST 2012
We hope you can join us this Wednesday, November 7, 2012 for the Applied
Statistics Workshop from 12-1.30 pm. Jon Bischof, a Ph.D. candidate from
the Department of Statistics at Harvard University, will give a
presentation entitled "Summarizing Topical Content in Document Collections
with Word Frequency and Exclusivity". A light lunch will be served at 12 pm
and the talk will begin at 12.15.
> An ongoing challenge in the analysis of document collections is how to
> summarize content in terms of a set of inferred themes that can be
> interpreted substantively in terms of topics. However, the current practice
> of summarizing themes in terms of most frequent words limits
> interpretability by ignoring the differential use of words across topics.
> We argue that words that are both frequent and exclusive to a theme are
> more effective at characterizing topical content. We consider a setting
> where professional editors have annotated documents to a collection of
> topic categories, organized into a tree, in which leaf-nodes correspond to
> the most specific topics. Each document is annotated to multiple
> categories, at different levels of the tree. We introduce Hierarchical
> Poisson Convolution (HPC) as a model to analyze annotated documents in this
> setting. The model leverages the structure among categories defined by
> professional editors to infer a clear semantic description for each topic
> in terms of words that are both frequent and exclusive. We develop a
> parallelized Hamiltonian Monte Carlo sampler that allows the inference to
> scale to millions of documents.
An up-to-date schedule for the workshop is available at
Ph.D. Candidate in Government
E-mail: kkashin at fas.harvard.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gov3009-l