Dear Workshop Community,
Our next meeting will be *Wednesday November 20*, where Lucas Janson will
present research on* "*Recent Advances in Model-X Knockoffs*"*
*Abstract*: Two years ago in this workshop I presented my work on model-X
knockoffs, a method for high-dimensional variable selection that provides
exact (finite-sample) control of false discoveries and high power as a
result of its flexibility to leverage any and all domain knowledge and
tools from machine learning to search for signal. In this talk, I will
discuss two recent works that significantly advance the usability and
generality of model-X knockoffs. First, I will show how the original
assumptions of model-X knockoffs, that the multivariate distribution of the
covariates be known exactly, can be significantly relaxed to the assumption
that only a *model* for the covariates be known, and that model can have as
many free parameters as the *product* of the sample size and dimension. No
loss in the guarantees of knockoffs is incurred by this relaxation of the
assumptions. Second, I will show how to efficiently and exactly sample
knockoffs for *any *distribution on the covariates, even if the
distribution is only known up to a normalization constant. This
dramatically expands the set of covariate distribution for which we can
apply knockoffs. This is joint work with a number of collaborators, listed
below in the full references for the two works:
*D. Huang and L. Janson. Relaxing the Assumptions of Knockoffs by
Conditioning. Annals of Statistics (to appear), 2019.*
*S. Bates, E. Candès, L. Janson, and W. Wang. Metropolized Knockoff
Sampling. 2019.*
*Where:* CGIS Knafel Building, Room K354 (see this link
<https://map.harvard.edu/?bld=04471&level=9> for directions).
*When: *Wednesday, November 20 at 12 noon - 1:30 pm.
All are welcome! Lunch will be provided.
Best,
Georgie
Dear Workshop Community,
Our next meeting will be *Wednesday November 13*, where Professor Xiao-Li
Meng will present research on* "2020 Election and Privacy Protected Census:
Data Quantity vs. Quality & Privacy vs. Utility"*
*Abstract:* The year 2020 will be a busy one for statisticians and more
generally for data scientists; predictions about the 2020 US election are
already underway. Will the lessons from the 2016 US election be learned, or
will the prediction failure be repeated? How do we measure the quality of
the data we rely upon for predictions? How small are our big data when we
take their quality into account? The US Census Bureau has announced that
the data from the 2020 Census will be released under differential privacy
protection, which – in layperson’s terms – means adding some noise to the
data in order to prevent re-identification of individuals and other
privacy-related threats. Few would argue against protecting data privacy,
but what trade-offs would be acceptable between data privacy and data
utility? How much information do we lose by making data differentially
private? How should we analyze differential privacy protected data? This
talk invites the audience on a journey of deep statistical thinking
prompted by these questions, regardless of whether or not they have any
interest in the US politics and census.
*Where:* CGIS Knafel Building, Room K354 (see this link
<https://map.harvard.edu/?bld=04471&level=9> for directions).
*When: *Wednesday, November 13 at 12 noon - 1:30 pm.
All are welcome! Lunch will be provided.
Best,
Georgie
*Many of you will be interested in the second issue of the Harvard Data
Science Review, which Professor Meng helped launch. You can find it
here: https://hdsr.mitpress.mit.edu <https://hdsr.mitpress.mit.edu>.*
Dear Workshop Community,
Our next meeting will be *Wednesday November 6*, where Nicole Pashley will
present research on* "Causal Inference for Multiple Non-Randomized
Treatment Using Fractional Factorial Designs"*
*Abstract:* We explore a framework for addressing causal questions in an
observational setting with multiple treatments. This setting involves
attempting to approximate an experiment from observational data. With
multiple treatments, this experiment would be a factorial design. However,
certain treatment combinations may be so rare that, for some combinations,
we have no measured outcomes in the observed data. We propose to
conceptualize a hypothetical fractional factorial experiment instead of a
full factorial experiment and lay out a framework for analysis in this
setting. We also connect our design-based methods to standard regression
methods. We illustrate the method and the challenges of this type of data
through application.
*Where:* CGIS Knafel Building, Room K354 (see this link
<https://map.harvard.edu/?bld=04471&level=9> for directions).
*When: *Wednesday, November 6 at 12 noon - 1:30 pm.
All are welcome. Lunch will be provided.
Best,
Georgie