Our speaker this Wednesday (4/16) at Applied Stats will be Finale Doshi, a post-doc at Harvard Medical School and the Harvard School of Engineering and Applied Sciences. Finale completed her PhD in Computer Science from MIT in 2012 which applied Bayesian nonparametric models (which have the nice property of scaling the sophistication of learned models with the complexity of the data) to problems in reinforcement learning.

-----------------

Tess Wise
PhD Candidate
Harvard Department of Government
http://tesswise.com

Prediction and Interpretation with Latent Variable Models

Latent variable models provide a powerful tool for summarizing data through a set of hidden variables. These models are generally trained to maximize prediction accuracy, and modern latent variable models now do an excellent job of finding compact summaries of the data with high predictive power. However, there are many situations in which good predictions alone are not sufficient. Whether the hidden variables have inherent value by providing insights about the data, or whether we simply wish to improve a system, understanding what the discovered hidden variables mean is an important first step.

In this talk, I will discuss one particular model, GraphSparse LDA, for discovering interpretable latent structures without sacrificing (and sometimes improving upon) prediction accuracy. The model incorporates knowledge about the relationships between observed dimensions into a probabilistic framework to find a small set of human-interpretable "concepts" that summarize the observed data. This approach allows us to recover interpretable descriptions of clincially-relevant autism phenotypes from a medical dataset with thousands of dimensions.