Dear Applied Statistics Workshop Community,
This is a gentle reminder that our first meeting of the semester will
be at *12:10
pm (EST) tomorrow on Zoom
<https://harvard.zoom.us/j/97004196610?pwd=eGFydkF5RDRjUlk5RVcyTjV6OStUQT09>*,
where Tracy Ke <http://zke.fas.harvard.edu>(Harvard University) presents
"Learning Research Areas and Author Research Interests from Bibtex and
Citations."
*Abstract*
Given the scientific publications in a field, we are interested in using
bibtex and citation data to estimate (a) the primary research areas in this
field, (b) the research interests of individual authors (which may evolve
with time), and (c) the citation impacts of different research topics in
this field. We answer questions (a)-(b) by studying the co-citation
networks of authors. We model them by a dynamic mixed-membership model,
where each primary area is a “community”, and the author research interests
are described by the time-varying “mixed membership vectors”. We propose a
spectral algorithm for estimating these membership vectors. We answer
question (c) by joint modeling citations and paper abstracts. We propose
the Hofmann-Stigler model, which imposes K “topic vectors” in text
abstracts, K “export scores” to model the citation impact of these topics,
and a “topic weight vector” for each paper. We propose a spectral algorithm
for parameter estimation, which output can be used to rank topics.
We implemented our methods in a data set about publications in statistics.
It covers over 83K papers in 36 journals in statistics spanning 41 years.
We discovered a “Statistics Triangle” that is connected to Bradley Efron’s
Statistics Philosophy Triangle (Efron’s triangle is subjective, but our
triangle is from data). We also discovered the trend of moving towards the
popular sub-area of “High-dimensional Data Analysis” of quite a few
high-profile authors. We also found that the research topic “Mathematical
Statistics” is ranked 1st in terms of the citation impact.
This is joint work with Pengsheng Ji, Jiashun Jin and Wanshan Li. The talk
is partially based on the paper “Co-citation and Co-authorship Networks of
Statisticians” (Journal of Business & Economic Statistics, to appear).
*When:* Wednesday, January 26 at 12:10 - 1:30 pm.
*Zoom link*:
https://harvard.zoom.us/j/97004196610?pwd=eGFydkF5RDRjUlk5RVcyTjV6OStUQT09
*Schedule of the workshop*:
https://projects.iq.harvard.edu/applied.stats.workshop-gov3009
Best,
Sooahn