Normalizing Flows for Knockoff-free Controlled Feature Selection

Authors: Derek Hansen, Brian Manzo, Jeffrey Regier

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, FLOWSELECT consistently controls the FDR on both synthetic and semi-synthetic benchmarks, whereas competing knockoff-based approaches do not. FLOWSELECT also demonstrates greater power on these benchmarks. Additionally, FLOWSELECT correctly infers the genetic variants associated with specific soybean traits from GWAS data.
Researcher Affiliation Academia Derek Hansen Department of Statistics University of Michigan dereklh@umich.edu Brian Manzo Department of Statistics University of Michigan bmanzo@umich.edu Jeffrey Regier Department of Statistics University of Michigan regier@umich.edu
Pseudocode Yes Algorithm 1 Step 2 of the FLOWSELECT procedure for drawing K null features Xi,j|Xi, j for feature j at observation i.
Open Source Code Yes Software to reproduce our experiments is available at https://github.com/dereklhansen/flowselect.
Open Datasets Yes We use single-cell RNA sequencing (sc RNA-seq) data from 10x Genomics (10x Genomics, 2017)... We tested FLOWSELECT on a dataset from the Soy NAM project (Song et al., 2017)
Dataset Splits Yes For each model, we use 90% of the data for training to generate null features and the remaining 10% for calculating the feature statistics.
Hardware Specification No The paper mentions running experiments "using a single GPU" in Section 5.4, but it does not specify the model or any other detailed hardware specifications.
Software Dependencies No The paper mentions various software components and methods used (e.g., LASSO, random forest regressor, normalizing flows), but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup No The paper describes the general experimental setup, such as data splits (90% training, 10% for statistics) and the types of models used (LASSO, random forest, neural network architecture), but it does not provide specific hyperparameters like learning rates, batch sizes, or explicit numbers of MCMC samples in the main text. It defers "Additional details of training and architecture" to Appendices E.