DeepPINK: reproducible feature selection in deep neural networks

Authors: Yang Lu, Yingying Fan, Jinchi Lv, William Stafford Noble

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we apply Deep PINK (Deep feature selection using Paired-Input Nonlinear Knockoffs) to both simulated and real data sets to demonstrate its empirical utility. 2
Researcher Affiliation Academia Yang Young Lu Department of Genome Sciences University of Washington Seattle, WA 98195 ylu465@uw.edu; Yingying Fan Data Sciences and Operations Department Marshall School of Business University of Southern California Los Angeles, CA 90089 fanyingy@marshall.usc.edu; Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California Los Angeles, CA 90089 jinchilv@marshall.usc.edu; William Stafford Noble Department of Genome Sciences and Department of Computer Science and Engineering University of Washington Seattle, WA 98195 william-noble@uw.edu
Pseudocode No The paper describes the Deep PINK architecture and its components but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes All code and data will be available here: github.com/younglululu/Deep PINK.
Open Datasets Yes We use synthetic data to compare the performance of Deep PINK to existing methods in the literature. We also apply Deep PINK to two real data sets to demonstrate its empirical utility. We first apply Deep PINK to the task of identifying mutations associated with drug resistance in HIV-1 [32]. We use a cross-sectional study of n = 98 healthy volunteers to investigate the dietary effect on the human gut microbiome [12, 26, 45].
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using 'Adam [24]' for training, but it does not provide specific ancillary software details like library or solver names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes In this work, we use an MLP with 2 hidden layers, each containing p neurons. We use L1-regularization in the MLP with regularization parameter set to O( q / n ). We use Adam [24] to train the deep learning model with respect to the mean squared error loss, using an initial learning rate of 0.001 and batch size 10.