Normalizing Flows for Knockoff-free Controlled Feature Selection
Authors: Derek Hansen, Brian Manzo, Jeffrey Regier
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, FLOWSELECT consistently controls the FDR on both synthetic and semi-synthetic benchmarks, whereas competing knockoff-based approaches do not. FLOWSELECT also demonstrates greater power on these benchmarks. Additionally, FLOWSELECT correctly infers the genetic variants associated with specific soybean traits from GWAS data. |
| Researcher Affiliation | Academia | Derek Hansen Department of Statistics University of Michigan dereklh@umich.edu Brian Manzo Department of Statistics University of Michigan bmanzo@umich.edu Jeffrey Regier Department of Statistics University of Michigan regier@umich.edu |
| Pseudocode | Yes | Algorithm 1 Step 2 of the FLOWSELECT procedure for drawing K null features Xi,j|Xi, j for feature j at observation i. |
| Open Source Code | Yes | Software to reproduce our experiments is available at https://github.com/dereklhansen/flowselect. |
| Open Datasets | Yes | We use single-cell RNA sequencing (sc RNA-seq) data from 10x Genomics (10x Genomics, 2017)... We tested FLOWSELECT on a dataset from the Soy NAM project (Song et al., 2017) |
| Dataset Splits | Yes | For each model, we use 90% of the data for training to generate null features and the remaining 10% for calculating the feature statistics. |
| Hardware Specification | No | The paper mentions running experiments "using a single GPU" in Section 5.4, but it does not specify the model or any other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions various software components and methods used (e.g., LASSO, random forest regressor, normalizing flows), but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | No | The paper describes the general experimental setup, such as data splits (90% training, 10% for statistics) and the types of models used (LASSO, random forest, neural network architecture), but it does not provide specific hyperparameters like learning rates, batch sizes, or explicit numbers of MCMC samples in the main text. It defers "Additional details of training and architecture" to Appendices E. |