gov3009-l February 2018

gov3009-l@lists.fas.harvard.edu

1 participants
4 discussions

by Dana Higgins

Hi everyone! This week at the Applied Statistics Workshop we will be welcoming *Isaac Kohane*, Professor of Pediatrics at Harvard Medical School. He will be presenting work entitled* Interesting early results in data science that require new statistical methods*. Please find the abstract below and on the Applied Stats website here <https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>. As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be provided. See you all there! -- Dana Higgins *Title:* *Interesting early results in data science that require new statistical methods* *Abstract:* In the brave new world of biomedical data science, new sources of data emerge seemingly every year from Twitter to genomes to weather to drug habits and/or doctor preferences. I will outline several interesting and apparently impactful findings that have emerged as a result of analyses of these data, both individually and jointly. I will then follow the discussion of these early successes with an outline of significant unanswered methodological challenges requiring a systematic and sound response if further progress is to be achieved. In particular, I will focus on those challenges which I believe are of the most interest to the biostatistical community.

6 years, 1 month

Applied Statistics 2/21

by Dana Higgins

Hi everyone! This week at the Applied Statistics Workshop we will be welcoming *Emily Breza*, Assistant Professor of Economics at Harvard University. She will be presenting work entitled *Using Aggregated Relational Data to Feasibly Identify Network Structure without Network Data*. Please find the abstract below and on the Applied Stats website here <https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>. As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be provided. See you all there! -- Dana Higgins *Title:* * Using Aggregated Relational Data to Feasibly Identify Network Structure without Network Data * *Abstract:* Social network data is often prohibitively expensive to collect, limiting empirical network research. Typical economic network mapping requires (1) enumerating a census, (2) eliciting the names of all network links for each individual, (3) matching the list of social connections to the census, and (4) repeating (1)-(3) across many networks. In settings requiring field surveys, steps (2)-(3) can be very expensive. In other network populations such as financial intermediaries or high-risk groups, proprietary data and privacy concerns may render (2)-(3) impossible. Both restrict the accessibility of high-quality networks research to investigators with considerable resources. We propose an inexpensive and feasible strategy for network elicitation using Aggregated Relational Data (ARD) – responses to questions of the form “How many of your social connections have trait k?” Our method uses ARD to recover the parameters of a general network formation model, which in turn, permits the estimation of any arbitrary node- or graph-level statistic. The method works well in simulations and in matching a range of network characteristics in real-world graphs from 75 Indian villages. Moreover, we replicate the results of two field experiments that involved collecting network data. We show that the researchers would have drawn similar conclusions using ARD alone. Finally, using calculations from J-PAL fieldwork, we show that in rural India, for example, ARD surveys are 80% cheaper than full network surveys.

6 years, 2 months

Applied Statistics 2/14

by Dana Higgins

Hi everyone! This week at the Applied Statistics Workshop we will be welcoming *Lucas Janson*, Assistant Professor of Statistics at Harvard University. He will be presenting work entitled *Using Knockoffs to Find Important Variables with Statistical Guarantees*. Please find the abstract below and on the Applied Stats website here <https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>. As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be provided. See you all there! -- Dana Higgins *Title:* *Using Knockoffs to find important variables with statistical guarantees* *Abstract:* Many contemporary large-scale applications, from genomics to advertising, involve linking a response of interest to a large set of potential explanatory variables in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively select important variables while controlling the fraction of false discoveries, even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, we propose a new framework of model-X knockoffs, which reads from a different perspective the knockoff procedure (Barber and Candès, 2015) originally designed for controlling the false discovery rate in linear models. Model-X knockoffs can deal with arbitrary (and unknown) conditional models and any dimensions, including when the number of explanatory variables p exceeds the sample size n. Our approach requires the design matrix be random (independent and identically distributed rows) with a known distribution for the explanatory variables, although we show preliminary evidence that our procedure is robust to unknown/estimated distributions. As we require no knowledge/assumptions about the conditional distribution of the response, we effectively shift the burden of knowledge from the response to the explanatory variables, in contrast to the canonical model-based approach which assumes a parametric model for the response but very little about the explanatory variables. To our knowledge, no other procedure solves the controlled variable selection problem in such generality, but in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case-control study of Crohn’s disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data.

6 years, 2 months

Applied Statistics 2/6

by Dana Higgins

Hi everyone! This week at the Applied Statistics Workshop we will be welcoming *Jose Zubizaretta*, Assistant Professor of Health Care Policy at Harvard Medical School. He will be presenting work entitled *Building Representative Matched Samples in Large-Scale Observational Studies with Multivalued Treatments*. Please find the abstract below and on the Applied Stats website here <https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>. As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be provided. See you all there! -- Dana Higgins *Title:* *Building Representative Matched Samples in Large-Scale Observational Studies with Multivalued Treatments * *Abstract:* In observational studies of causal effects, matching methods are widely used to approximate the ideal study that would be conducted under controlled experimentation. In this talk, I will discuss new matching methods that use tools from modern optimization to overcome four limitations of standard matching approaches. In particular, these new matching methods (i) directly obtain flexible forms of covariate balance, as specified before matching by the investigator; (ii) produce self-weighting matched samples that are representative of target populations by design; and (iii) handle multiple treatment doses without resorting to a generalization of the propensity score. (iv) These methods can handle large data sets quickly. I will illustrate the performance of these methods in a case studies about the impact of an earthquake on post-traumatic stress and standardized test scores.

6 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

gov3009-l February 2018