Hi everyone!
This week at the Applied Statistics Workshop we will be welcoming *Lucas
Janson*, Assistant Professor of Statistics at Harvard University. He will
be presenting work entitled *Using Knockoffs to Find Important Variables
with Statistical Guarantees*. Please find the abstract below and on the
Applied Stats website here
<https://projects.iq.harvard.edu/applied.stats.workshop-gov3009>.
As usual, we will meet at noon in CGIS Knafel Room 354 and lunch will be
provided. See you all there!
-- Dana Higgins
*Title:* *Using Knockoffs to find important variables with statistical
guarantees*
*Abstract:* Many contemporary large-scale applications, from genomics to
advertising, involve linking a response of interest to a large set of
potential explanatory variables in a nonlinear fashion, such as when the
response is binary. Although this modeling problem has been extensively
studied, it remains unclear how to effectively select important variables
while controlling the fraction of false discoveries, even in
high-dimensional logistic regression, not to mention general
high-dimensional nonlinear models. To address such a practical problem, we
propose a new framework of model-X knockoffs, which reads from a different
perspective the knockoff procedure (Barber and Candès, 2015) originally
designed for controlling the false discovery rate in linear models. Model-X
knockoffs can deal with arbitrary (and unknown) conditional models and any
dimensions, including when the number of explanatory variables p exceeds
the sample size n. Our approach requires the design matrix be random
(independent and identically distributed rows) with a known distribution
for the explanatory variables, although we show preliminary evidence that
our procedure is robust to unknown/estimated distributions. As we require
no knowledge/assumptions about the conditional distribution of the
response, we effectively shift the burden of knowledge from the response to
the explanatory variables, in contrast to the canonical model-based
approach which assumes a parametric model for the response but very little
about the explanatory variables. To our knowledge, no other procedure
solves the controlled variable selection problem in such generality, but in
the restricted settings where competitors exist, we demonstrate the
superior power of knockoffs through simulations. Finally, we apply our
procedure to data from a case-control study of Crohn’s disease in the
United Kingdom, making twice as many discoveries as the original analysis
of the same data.
Show replies by date