Few-Sample Feature Selection via Feature Manifold Learning

Authors: David Cohen, Tal Shnitzer, Yuval Kluger, Ronen Talmon

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We showcase the efficacy of our method on illustrative examples and several benchmarks, where our method demonstrates higher accuracy in selecting the informative features compared to competing methods. In addition, we show that our FS leads to improved classification and better generalization when applied to test data.
Researcher Affiliation Academia 1Viterbi Faculty of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel 2CSAIL, Massachusetts Institute of Technology, Cambridge, USA 3Department of Pathology, Yale School of Medicine, New Haven, CT 06511, USA 4Applied Mathematics Program, Yale University, New Haven, CT 06511, USA.
Pseudocode Yes Algorithm 1 Mani Fe St Score; Algorithm 2 Mani Fe St Score for SPSD Matrices
Open Source Code Yes The code implementing Mani Fe St, along with the script for reproducing the illustrative example, is available on Git Hub1. 1https://github.com/David Cohen2/Mani Fe St
Open Datasets Yes We illustrate our approach using MNIST (Deng, 2012).; We test Mani Fe St on the Madelon synthetic dataset (Guyon et al., 2008) from the NIPS 2003 feature selection challenge.; We test a dataset of colon cancer gene expression samples (Alon et al., 1999)
Dataset Splits Yes In all experiments, the data is split to train and test sets with nested cross-validation. The nested train set is divided using 10-fold cross-validation for all datasets.; The optimization of SVM hyperparameters is performed over validation sets to improve the generalization of the model
Hardware Specification Yes All the experiments were performed using Python on a standard PC with an Intel i7-12700 CPU and 64GB of RAM without GPUs.
Software Dependencies No While software components like "Python", "scikit-learn package", "skfeature repository", and "Panda package" are mentioned, specific version numbers for these software dependencies are not provided.
Experiment Setup Yes For the SVM hyperparameter tuning, we follow (Hsu et al., 2003). We use an RBF kernel and perform a grid search on the penalty parameter C and the kernel scale γ. C and γ are tuned over exponentially growing sequences, C = {2 5, 2 2, 21, 24, 27, 210, 213} and γ = {2 15, 2 12, 2 9, 2 6, 2 3, 20, 23}. We tune the number of neighbors for IG and Relief F over the grid k = {1, 3, 5, 10, 15, 20, 30, 50, 100}. For the Laplacian score, the samples kernel scale is tuned to the ith percentile of Euclidean distances over the grid i = {1, 5, 10, 30, 50, 70, 90, 95, 99}.