Hi all,
Our next meeting will be *Wednesday February 26*, where Shusei Eshima and
Tomoya Sasaki will present research* on “Keyword Assisted Topic Models”.*
*Abstract:* For a long time, many social scientists have conducted a
content analysis by simply counting carefully selected key words and
phrases contained in documents of interest. In recent years, however,
probabilistic topic models have become increasingly popular because of
their ability to uncover topics and keywords based on the co-occurrence of
certain words. Unfortunately, applied researchers find that these models
often fail to yield topics of their interest by inadvertently creating
nonsensical topics, merging unrelated topics, or splitting a single
coherent topic. In this paper, we empirically demonstrate that providing
topic models with a small number of keywords can dramatically improve their
performance. The proposed keyword assisted topic model (keyATM) offers an
important advantage that the specification of keywords requires researchers
to label topics prior to fitting a model to the data. This contrasts with
a widespread practice of post-hoc topic interpretation and adjustments that
compromises the objectivity of empirical findings. In our applications, we
find that the keyATM provides more interpretable results, has better
document classification performance, and is more robust to the number of
topics than the standard topic models. Finally, keyATM can also model
covariate effects and time trends. An open-source software package is
freely available for implementing the proposed methodology.
*Where:* CGIS Knafel Building, Room K354 (see this link
<https://map.harvard.edu/?bld=04471&level=9> for directions).
*When: *Wednesday, February 26 at 12noon - 1:30pm.
All are welcome and lunch will be provided.
Best,
Georgie
Hi all,
Our next meeting will be *Wednesday February 19*, where Asya Magazinnik
will present research* “What Do we Learn About Voter Preferences From
Conjoint Experiments?”*
*Abstract:* Political scientists frequently interpret the results of
conjoint experiments as reflective of voter preferences. In this paper we
show that the target esti- mand of conjoint experiments, the AMCE, is not
well-defined in these terms. Even with individually rational experimental
subjects, unbiased estimates of the AMCE can indicate the opposite of the
true preference of the majority. To show this, we characterize the
preference aggregation rule implied by AMCE and demonstrate its several
undesirable properties. With this result we provide a method for placing
sharp bounds on the proportion of experimental subjects with a strict
preference for a given candidate-feature. We provide a testable assumption
to show when the AMCE corresponds in sign with the majority preference.
Finally, we offer a structural interpretation of the AMCE and highlight
that the problem we describe persists even when a model of voting is
imposed.
The paper can be found here
<https://scholar.princeton.edu/sites/default/files/kkocak/files/conjoint_dra…>
.
*Where:* CGIS Knafel Building, Room K354 (see this link
<https://map.harvard.edu/?bld=04471&level=9> for directions).
*When: *Wednesday, February 17 at 12noon - 1:30pm.
All are welcome and lunch will be provided.
Best,
Georgie
Hi all,
Our next meeting will be *Wednesday February 12*, where Adam Kapelner will
present research on* “**Harmonizing Optimized Designs with Classic
Randomization in Experiments**”. *
*Abstract:* There is a long debate in experimental design between the
classic randomization design of Fisher, Yates, Kempthorne, Cochran, and
those who advocate deterministic assignments based on notions of
optimality. In nonsequential trials comparing treatment and control,
covariate measurements for each subject are known in advance, and subjects
can be divided into two groups based on a criterion of imbalance. With the
advent of modern computing, this partition can be made nearly perfectly
balanced via numerical optimization, but these allocations are far from
random. These perfect allocations may endanger estimation relative to
classic randomization because unseen subject-specific characteristics can
be highly imbalanced. To demonstrate this,we consider different performance
criterions such as Efron’s worst-case analysis and our original tail
criterion of mean squared error. Under our tail criterion for the
differences-in-mean estimator, we prove asymptotically that the optimal
design must be more random than perfect balance but is less random than
completely random. Our result vindicates restricted designs that are used
regularly such as blocking and rerandomization. For a covariate-adjusted
estimator, balancing offers less rewards and it seems good performance is
achievable with complete randomization. Further work will provide a
procedure to find the explicit optimal design in different scenarios in
practice. Supplementary materials for this article are available online.
The paper can be found here
<https://amstat.tandfonline.com/doi/abs/10.1080/00031305.2020.1717619#.Xj1kf…>
.
*Where:* CGIS Knafel Building, Room K354 (see this link
<https://map.harvard.edu/?bld=04471&level=9> for directions).
*When: *Wednesday, February 12 at 12 noon - 1:30 pm.
As always, all are welcome and lunch will be provided.
Best,
Georgie
Hi all,
Our next meeting will be *Wednesday February 5*, where Gary King will
present research on* “Statistically Valid Inferences from Privacy Protected
Data”.*
*Abstract:* Unprecedented quantities of data that could help social
scientists understand and ameliorate the challenges of human society are
presently locked away inside companies, governments, and other
organizations, in part because of worries about privacy violations. We
address this problem with a general-purpose data access and analysis system
with mathematical guarantees of privacy for individuals who may be
represented in the data, statistical guarantees for researchers seeking
insights from it, and protection for society from some fallacious
scientific conclusions. We build on the standard of "differential privacy''
but, unlike most such approaches, we also correct for the serious
statistical biases induced by privacy-preserving procedures, provide a
proper accounting for statistical uncertainty, and impose minimal
constraints on the choice of data analytic methods and types of quantities
estimated. Our algorithm is easy to implement, simple to use, and
computationally efficient; we also offer open source software to illustrate
all our methods.
Slides and paper here
<https://gking.harvard.edu/presentations/statistically-valid-inferences-priv…>
.
*Where:* CGIS Knafel Building, Room K354 (see this link
<https://map.harvard.edu/?bld=04471&level=9> for directions).
*When: *Wednesday, February 5 at 12 noon - 1:30 pm.
All are welcome and lunch will be provided.
Best,
Georgie