reproducibilityindex.ai

Self-Supervision Enhanced Feature Selection with Correlated Gates

Authors: Changhee Lee, Fergus Imrie, Mihaela van der Schaar

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on multiple real-world datasets including clinical and omics demonstrate that our model discovers relevant features that provide superior prediction performance compared to the state-of-the-art benchmarks in practical scenarios where there is often limited labeled data and high correlations among features.
Researcher Affiliation	Academia	Changhee Lee Chung-Ang University, Korea changheelee@cau.ac.kr Fergus Imrie UCLA, USA imrie@ucla.edu Mihaela van der Schaar University of Cambridge, UK UCLA, USA Alan Turing Institute, UK mv472@cam.ac.uk
Pseudocode	Yes	A PSEUDO-CODE OF SEFS. SEFS is trained via a two-step training procedure. We provide pseudo-codes for the Self-Supervision Phase in Algorithm 1 and for the Supervision Phase in Algorithm 2.
Open Source Code	Yes	The source code of SEFS and the synthetic dataset are provided in the Supplementary Material and is also available at https://github.com/chl8856/SEFS.
Open Datasets	Yes	Experiments on multiple real-world datasets including clinical and omics... The UK Cystic Fibrosis registry (UKCF)3 records annual follow-ups for 6,754 adults over the period 2008 2015. 3https://www.cysticfibrosis.org.uk/... proteomic measurements from the Cancer Cell Line Encyclopedia (CCLE, Barretina et al. (2012))... distinguishing sub-populations of T-cells... from puriﬁed populations of peripheral blood monocytes (PBMCs)5 based on transcriptomic measurements... 5https://support.10xgenomics.com/single-cell-gene-expression/datasets
Dataset Splits	Yes	For all methods, we use 20% of the overall training set as the validation set, which will then be unseen for training feature selection methods with chosen hyper-parameters.
Hardware Specification	Yes	The speciﬁcation of the machine is: CPU Intel Core i7-8700K, GPU NVIDIA Ge Force GTX 1080Ti, and RAM 64GB DDR4
Software Dependencies	No	The paper mentions 'Adam' as an optimization algorithm and 'Re Lu' as non-linearity. It also notes 'Implemented using Python package scikit-learn' for Lasso and Tree, and 'Implemented using Python package scikit-learn' for agglomerative clustering, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	The hyper-parameters of SEFS and those of the benchmarks are chosen via a grid search. For all methods, we use 20% of the overall training set as the validation set... Table S.1: Hyper-parameters of SEFS includes Learning rate {0.0001, 0.001, 0.01}, No. of hidden units {10, 30, 50, 100}, No. of layers {1, 2, 3}, No. of nodes {10, 30, 50, 100, 300, 500}, Coeff. α {0.01, 0.1, 1.0, 10, 100}, Coeff. π {0.2, 0.4, 0.6, 0.8}, Temperature τ 1.0, Dropout 0.3, Coeff. β {0.01, 0.1, 1.0, 10, 100}.