gov3009-l September 2014

gov3009-l@lists.fas.harvard.edu

2 participants
4 discussions

by Dana Higgins

Hi everyone! Our speaker this Wednesday (9/24) at Applied Stats will be our own* Brandon Stewart, *who will be practicing his job talk. Brandon will be giving a talk entitled *Latent Factor Regressions for the Social Sciences**. *The abstract for the talk is included below. As per usual, we will meet in CGIS K354 at 12 noon and lunch will be served. I look forward to seeing you all there! Thank you! -- Dana Higgins *Abstract: *I present a general framework for regression in the presence of complex dependence structures between units such as in time-series cross-sectional data, relational/network data, and spatial data. These types of data are challenging for standard multilevel models because they involve multiples types of structure (e.g. temporal effects and cross-sectional effects) which are interactive. I show that interactive latent factor models provide a powerful modeling alternative that can address a wide range of data types. Although, related models have previously been proposed in several different fields, inference is typically cumbersome and slow. I introduce a class of fast variational inference algorithms that allow for models to be fit quickly and accurately.

9 years, 7 months

Applied Stats 9/17 James Lloyd

by Dana Higgins

Hi everyone! Our speaker this Wednesday (9/17) at Applied Stats will be* James Lloyd, *from the University of Cambridge and the Cambridge Machine Learning Group. James will be giving a talk entitled *The Automatic Statistician. *The abstract for the talk is included below. As per usual, we will meet in CGIS K354 at 12 noon and lunch will be served. I look forward to seeing you all there! Also, check out the new website ( here <http://projects.iq.harvard.edu/applied.stats.workshop-gov3009/>) to see the schedule for the next couple of weeks. Thank you! -- Dana Higgins *Abstract: *While it is becoming easier to collect and store all kinds of data, including personal medical data, scientific data, and commercial data, there are relatively few people trained in the statistical and machine learning methods required to test hypotheses, make predictions, and otherwise create interpretable knowledge from this data. The automatic statistician project aims to build an artificial intelligence for data science, to help people make sense of their data and to uncover challenging research problems in automatic data analysis. I will discuss an early version of the system which can build statistical models from an open-ended language of models and then describe them in natural language. I will briefly review the class of regression models which the system constructs and how their properties allow for a modular description generation algorithm. The talk will conclude with examples of the output of the system and a discussion of future research directions.

9 years, 7 months

Applied Stats 9/10 Ryan Adams

by Dana Higgins

Hi everyone! Our speaker this Wednesday (9/10) at Applied Stats will be* Ryan Adams, *an Assistant Professor of Computer Science at SEAS. His research focuses on machine learning and computational statistics, but he is broadly interested in questions related to artificial intelligence, computational neuroscience, machine vision, and Bayesian nonparametrics. Ryan will be giving a talk entitled *Accelerating Exact MCMC with Subsets of Data**. *The abstract for the talk is included below. As per usual, we will meet in CGIS K354 at 12 noon and lunch will be served. I look forward to seeing you all there! Also, check out the new website ( here <http://projects.iq.harvard.edu/applied.stats.workshop-gov3009/>) to see the schedule for the first few weeks. Thank you! -- Dana Higgins *Abstract: * One of the challenges of building statistical models for large data sets is balancing the correctness of inference procedures against computational realities. In the context of Bayesian procedures, the pain of such computations has been particularly acute as it has appeared that algorithms such as Markov chain Monte Carlo necessarily need to touch all of the data at each iteration in order to arrive at a correct answer. Several recent proposals have been made to use subsets (or "minibatches") of data to perform MCMC in ways analogous to stochastic gradient descent. Unfortunately, these proposals have only provided approximations, although in some cases it has been possible to bound the error of the resulting stationary distribution. In this talk I will discuss two new, complementary algorithms for using subsets of data to perform faster MCMC. In both cases, these procedures yield stationary distributions that are exactly the desired target posterior distribution. The first of these, "Firefly Monte Carlo", is an auxiliary variable method that uses randomized subsets of data to achieve valid transition operators, with connections to recent developments in pseudo-marginal MCMC. The second approach I will discuss, parallel predictive prefetching, uses subsets of data to parallelize Markov chain Monte Carlo across multiple cores, while still leaving the target distribution intact. These methods have both yielded significant gains in wallclock performance in sampling from posterior distributions with millions of data.

9 years, 7 months

Applied Statistics Workshop

by Dana Higgins

Dear all, I hope everyone has had a relaxing summer! I am the new graduate student coordinator for the Applied Statistics Workshop (Gov 3009) at IQSS this semester and would like to invite all of you to attend the workshop. The workshop features a multidisciplinary forum for presenting research with statistical innovations and applications. Starting with Wednesday, Sept. 3, we will meet every Wednesday from 12-1:30 pm in CGIS-Knafel 354 (1737 Cambridge Street). As always, lunch will be provided. Please note that you don’t have to formally enroll in the workshop to attend. Furthermore, if you would like your name to be added to the mailing list, please let me know. Our first speaker is Eric Chaney from the Harvard Department of Economics. The title of his presentation is "The Medieval Origins of Comparative European Development: Evidence from the Basque Country." The abstract is below. Check out the new website to see the schedule for the first few weeks. Thank you! -- Dana Higgins Abstract: This paper investigates the present-day economic impact of medieval republican institutions along the historical borders of the Basque Country in Spain and France. I present evidence suggesting that medieval republican institutions have had a lasting effect: in Spain the drop in incomes along the Basque border is similar to that between the richest and poorest areas of the euro zone today. Using present-day and historical data, I investigate the mechanisms through which these medieval institutions have had enduring effects. Although I find evidence of significant cultural differences at the Basque border, results using institutional variation generated by the partition of Basque regions between France and Spain cast doubt on claims that these cultural differences are the fundamental cause behind today's economic differences. In addition, I track the evolution of a variety of variables in the border region back in time. While institutional differences remain observable in the 18th century, all other observable differences between Basque and surrounding areas vanish or become negative by this date. When taken in unison, the results suggest the importance of the historical emergence of republican institutions -and their subsequent persistence- in generating within-European differences in economic outcomes today.

9 years, 7 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

gov3009-l September 2014