Self-Supervision Enhanced Feature Selection with Correlated Gates
Authors: Changhee Lee, Fergus Imrie, Mihaela van der Schaar
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multiple real-world datasets including clinical and omics demonstrate that our model discovers relevant features that provide superior prediction performance compared to the state-of-the-art benchmarks in practical scenarios where there is often limited labeled data and high correlations among features. |
| Researcher Affiliation | Academia | Changhee Lee Chung-Ang University, Korea changheelee@cau.ac.kr Fergus Imrie UCLA, USA imrie@ucla.edu Mihaela van der Schaar University of Cambridge, UK UCLA, USA Alan Turing Institute, UK mv472@cam.ac.uk |
| Pseudocode | Yes | A PSEUDO-CODE OF SEFS. SEFS is trained via a two-step training procedure. We provide pseudo-codes for the Self-Supervision Phase in Algorithm 1 and for the Supervision Phase in Algorithm 2. |
| Open Source Code | Yes | The source code of SEFS and the synthetic dataset are provided in the Supplementary Material and is also available at https://github.com/chl8856/SEFS. |
| Open Datasets | Yes | Experiments on multiple real-world datasets including clinical and omics... The UK Cystic Fibrosis registry (UKCF)3 records annual follow-ups for 6,754 adults over the period 2008 2015. 3https://www.cysticfibrosis.org.uk/... proteomic measurements from the Cancer Cell Line Encyclopedia (CCLE, Barretina et al. (2012))... distinguishing sub-populations of T-cells... from purified populations of peripheral blood monocytes (PBMCs)5 based on transcriptomic measurements... 5https://support.10xgenomics.com/single-cell-gene-expression/datasets |
| Dataset Splits | Yes | For all methods, we use 20% of the overall training set as the validation set, which will then be unseen for training feature selection methods with chosen hyper-parameters. |
| Hardware Specification | Yes | The specification of the machine is: CPU Intel Core i7-8700K, GPU NVIDIA Ge Force GTX 1080Ti, and RAM 64GB DDR4 |
| Software Dependencies | No | The paper mentions 'Adam' as an optimization algorithm and 'Re Lu' as non-linearity. It also notes 'Implemented using Python package scikit-learn' for Lasso and Tree, and 'Implemented using Python package scikit-learn' for agglomerative clustering, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The hyper-parameters of SEFS and those of the benchmarks are chosen via a grid search. For all methods, we use 20% of the overall training set as the validation set... Table S.1: Hyper-parameters of SEFS includes Learning rate {0.0001, 0.001, 0.01}, No. of hidden units {10, 30, 50, 100}, No. of layers {1, 2, 3}, No. of nodes {10, 30, 50, 100, 300, 500}, Coeff. α {0.01, 0.1, 1.0, 10, 100}, Coeff. π {0.2, 0.4, 0.6, 0.8}, Temperature τ 1.0, Dropout 0.3, Coeff. β {0.01, 0.1, 1.0, 10, 100}. |