reproducibilityindex.ai

Normalizing Flows for Knockoff-free Controlled Feature Selection

Authors: Derek Hansen, Brian Manzo, Jeffrey Regier

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, FLOWSELECT consistently controls the FDR on both synthetic and semi-synthetic benchmarks, whereas competing knockoff-based approaches do not. FLOWSELECT also demonstrates greater power on these benchmarks. Additionally, FLOWSELECT correctly infers the genetic variants associated with speciﬁc soybean traits from GWAS data.
Researcher Affiliation	Academia	Derek Hansen Department of Statistics University of Michigan dereklh@umich.edu Brian Manzo Department of Statistics University of Michigan bmanzo@umich.edu Jeffrey Regier Department of Statistics University of Michigan regier@umich.edu
Pseudocode	Yes	Algorithm 1 Step 2 of the FLOWSELECT procedure for drawing K null features Xi,j\|Xi, j for feature j at observation i.
Open Source Code	Yes	Software to reproduce our experiments is available at https://github.com/dereklhansen/ﬂowselect.
Open Datasets	Yes	We use single-cell RNA sequencing (sc RNA-seq) data from 10x Genomics (10x Genomics, 2017)... We tested FLOWSELECT on a dataset from the Soy NAM project (Song et al., 2017)
Dataset Splits	Yes	For each model, we use 90% of the data for training to generate null features and the remaining 10% for calculating the feature statistics.
Hardware Specification	No	The paper mentions running experiments "using a single GPU" in Section 5.4, but it does not specify the model or any other detailed hardware specifications.
Software Dependencies	No	The paper mentions various software components and methods used (e.g., LASSO, random forest regressor, normalizing flows), but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	No	The paper describes the general experimental setup, such as data splits (90% training, 10% for statistics) and the types of models used (LASSO, random forest, neural network architecture), but it does not provide specific hyperparameters like learning rates, batch sizes, or explicit numbers of MCMC samples in the main text. It defers "Additional details of training and architecture" to Appendices E.