Self-Supervision Enhanced Feature Selection with Correlated Gates

Authors: Changhee Lee, Fergus Imrie, Mihaela van der Schaar

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple real-world datasets including clinical and omics demonstrate that our model discovers relevant features that provide superior prediction performance compared to the state-of-the-art benchmarks in practical scenarios where there is often limited labeled data and high correlations among features.
Researcher Affiliation Academia Changhee Lee Chung-Ang University, Korea changheelee@cau.ac.kr Fergus Imrie UCLA, USA imrie@ucla.edu Mihaela van der Schaar University of Cambridge, UK UCLA, USA Alan Turing Institute, UK mv472@cam.ac.uk
Pseudocode Yes A PSEUDO-CODE OF SEFS. SEFS is trained via a two-step training procedure. We provide pseudo-codes for the Self-Supervision Phase in Algorithm 1 and for the Supervision Phase in Algorithm 2.
Open Source Code Yes The source code of SEFS and the synthetic dataset are provided in the Supplementary Material and is also available at https://github.com/chl8856/SEFS.
Open Datasets Yes Experiments on multiple real-world datasets including clinical and omics... The UK Cystic Fibrosis registry (UKCF)3 records annual follow-ups for 6,754 adults over the period 2008 2015. 3https://www.cysticfibrosis.org.uk/... proteomic measurements from the Cancer Cell Line Encyclopedia (CCLE, Barretina et al. (2012))... distinguishing sub-populations of T-cells... from purified populations of peripheral blood monocytes (PBMCs)5 based on transcriptomic measurements... 5https://support.10xgenomics.com/single-cell-gene-expression/datasets
Dataset Splits Yes For all methods, we use 20% of the overall training set as the validation set, which will then be unseen for training feature selection methods with chosen hyper-parameters.
Hardware Specification Yes The specification of the machine is: CPU Intel Core i7-8700K, GPU NVIDIA Ge Force GTX 1080Ti, and RAM 64GB DDR4
Software Dependencies No The paper mentions 'Adam' as an optimization algorithm and 'Re Lu' as non-linearity. It also notes 'Implemented using Python package scikit-learn' for Lasso and Tree, and 'Implemented using Python package scikit-learn' for agglomerative clustering, but does not provide specific version numbers for these software components.
Experiment Setup Yes The hyper-parameters of SEFS and those of the benchmarks are chosen via a grid search. For all methods, we use 20% of the overall training set as the validation set... Table S.1: Hyper-parameters of SEFS includes Learning rate {0.0001, 0.001, 0.01}, No. of hidden units {10, 30, 50, 100}, No. of layers {1, 2, 3}, No. of nodes {10, 30, 50, 100, 300, 500}, Coeff. α {0.01, 0.1, 1.0, 10, 100}, Coeff. π {0.2, 0.4, 0.6, 0.8}, Temperature τ 1.0, Dropout 0.3, Coeff. β {0.01, 0.1, 1.0, 10, 100}.