Hi everyone!
Our speaker this Wednesday (9/24) at Applied Stats will be our own* Brandon
Stewart, *who will be practicing his job talk. Brandon will be giving a
talk entitled *Latent Factor Regressions for the Social Sciences**. *The
abstract for the talk is included below. As per usual, we will meet in CGIS
K354 at 12 noon and lunch will be served.
I look forward to seeing you all there! Thank you!
-- Dana Higgins
*Abstract: *I present a general framework for regression in the presence of
complex dependence structures between units such as in time-series
cross-sectional data, relational/network data, and spatial data. These
types of data are challenging for standard multilevel models because they
involve multiples types of structure (e.g. temporal effects and
cross-sectional effects) which are interactive. I show that interactive
latent factor models provide a powerful modeling alternative that can
address a wide range of data types. Although, related models have
previously been proposed in several different fields, inference is
typically cumbersome and slow. I introduce a class of fast variational
inference algorithms that allow for models to be fit quickly and accurately.
Hi everyone!
Our speaker this Wednesday (9/17) at Applied Stats will be* James Lloyd, *from
the University of Cambridge and the Cambridge Machine Learning Group.
James will be giving a talk entitled *The Automatic Statistician. *The
abstract for the talk is included below. As per usual, we will meet in CGIS
K354 at 12 noon and lunch will be served.
I look forward to seeing you all there! Also, check out the new website (
here <http://projects.iq.harvard.edu/applied.stats.workshop-gov3009/>) to
see the schedule for the next couple of weeks. Thank you!
-- Dana Higgins
*Abstract: *While it is becoming easier to collect and store all kinds of
data, including personal medical data, scientific data, and commercial
data, there are relatively few people trained in the statistical and
machine learning methods required to test hypotheses, make predictions, and
otherwise create interpretable knowledge from this data. The automatic
statistician project aims to build an artificial intelligence for data
science, to help people make sense of their data and to uncover challenging
research problems in automatic data analysis. I will discuss an early
version of the system which can build statistical models from an open-ended
language of models and then describe them in natural language. I will
briefly review the class of regression models which the system constructs
and how their properties allow for a modular description generation
algorithm. The talk will conclude with examples of the output of the system
and a discussion of future research directions.
Hi everyone!
Our speaker this Wednesday (9/10) at Applied Stats will be* Ryan Adams, *an
Assistant Professor of Computer Science at SEAS. His research focuses on
machine learning and computational statistics, but he is broadly interested
in questions related to artificial intelligence, computational
neuroscience, machine vision, and Bayesian nonparametrics.
Ryan will be giving a talk entitled *Accelerating Exact MCMC with Subsets
of Data**. *The abstract for the talk is included below. As per usual, we
will meet in CGIS K354 at 12 noon and lunch will be served.
I look forward to seeing you all there! Also, check out the new website (
here <http://projects.iq.harvard.edu/applied.stats.workshop-gov3009/>) to
see the schedule for the first few weeks. Thank you!
-- Dana Higgins
*Abstract: *
One of the challenges of building statistical models for large data sets is
balancing the correctness of inference procedures against computational
realities. In the context of Bayesian procedures, the pain of such
computations has been particularly acute as it has appeared that algorithms
such as Markov chain Monte Carlo necessarily need to touch all of the data
at each iteration in order to arrive at a correct answer. Several recent
proposals have been made to use subsets (or "minibatches") of data to
perform MCMC in ways analogous to stochastic gradient descent.
Unfortunately, these proposals have only provided approximations, although
in some cases it has been possible to bound the error of the resulting
stationary distribution.
In this talk I will discuss two new, complementary algorithms for using
subsets of data to perform faster MCMC. In both cases, these procedures
yield stationary distributions that are exactly the desired target
posterior distribution. The first of these, "Firefly Monte Carlo", is an
auxiliary variable method that uses randomized subsets of data to achieve
valid transition operators, with connections to recent developments in
pseudo-marginal MCMC. The second approach I will discuss, parallel
predictive prefetching, uses subsets of data to parallelize Markov chain
Monte Carlo across multiple cores, while still leaving the target
distribution intact. These methods have both yielded significant gains in
wallclock performance in sampling from posterior distributions with
millions of data.
Dear all,
I hope everyone has had a relaxing summer! I am the new graduate student
coordinator for the Applied Statistics Workshop (Gov 3009) at IQSS this
semester and would like to invite all of you to attend the workshop. The
workshop features a multidisciplinary forum for presenting research with
statistical innovations and applications. Starting with Wednesday, Sept. 3,
we will meet every Wednesday from 12-1:30 pm in CGIS-Knafel 354 (1737
Cambridge Street). As always, lunch will be provided.
Please note that you don’t have to formally enroll in the workshop to
attend. Furthermore, if you would like your name to be added to the mailing
list, please let me know.
Our first speaker is Eric Chaney from the Harvard Department of Economics.
The title of his presentation is "The Medieval Origins of Comparative
European Development: Evidence from the Basque Country." The abstract is
below.
Check out the new website to see the schedule for the first few weeks.
Thank you!
-- Dana Higgins
Abstract:
This paper investigates the present-day economic impact of medieval
republican institutions along the historical borders of the Basque Country
in Spain and France. I present evidence suggesting that medieval republican
institutions have had a lasting effect: in Spain the drop in incomes along
the Basque border is similar to that between the richest and poorest areas
of the euro zone today. Using present-day and historical data, I
investigate the mechanisms through which these medieval institutions have
had enduring effects. Although I find evidence of significant cultural
differences at the Basque border, results using institutional variation
generated by the partition of Basque regions between France and Spain cast
doubt on claims that these cultural differences are the fundamental cause
behind today's economic differences. In addition, I track the evolution of
a variety of variables in the border region back in time. While
institutional differences remain observable in the 18th century, all other
observable differences between Basque and surrounding areas vanish or
become negative by this date. When taken in unison, the results suggest the
importance of the historical emergence of republican institutions -and
their subsequent persistence- in generating within-European differences in
economic outcomes today.