Few-Sample Feature Selection via Feature Manifold Learning
Authors: David Cohen, Tal Shnitzer, Yuval Kluger, Ronen Talmon
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We showcase the efficacy of our method on illustrative examples and several benchmarks, where our method demonstrates higher accuracy in selecting the informative features compared to competing methods. In addition, we show that our FS leads to improved classification and better generalization when applied to test data. |
| Researcher Affiliation | Academia | 1Viterbi Faculty of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel 2CSAIL, Massachusetts Institute of Technology, Cambridge, USA 3Department of Pathology, Yale School of Medicine, New Haven, CT 06511, USA 4Applied Mathematics Program, Yale University, New Haven, CT 06511, USA. |
| Pseudocode | Yes | Algorithm 1 Mani Fe St Score; Algorithm 2 Mani Fe St Score for SPSD Matrices |
| Open Source Code | Yes | The code implementing Mani Fe St, along with the script for reproducing the illustrative example, is available on Git Hub1. 1https://github.com/David Cohen2/Mani Fe St |
| Open Datasets | Yes | We illustrate our approach using MNIST (Deng, 2012).; We test Mani Fe St on the Madelon synthetic dataset (Guyon et al., 2008) from the NIPS 2003 feature selection challenge.; We test a dataset of colon cancer gene expression samples (Alon et al., 1999) |
| Dataset Splits | Yes | In all experiments, the data is split to train and test sets with nested cross-validation. The nested train set is divided using 10-fold cross-validation for all datasets.; The optimization of SVM hyperparameters is performed over validation sets to improve the generalization of the model |
| Hardware Specification | Yes | All the experiments were performed using Python on a standard PC with an Intel i7-12700 CPU and 64GB of RAM without GPUs. |
| Software Dependencies | No | While software components like "Python", "scikit-learn package", "skfeature repository", and "Panda package" are mentioned, specific version numbers for these software dependencies are not provided. |
| Experiment Setup | Yes | For the SVM hyperparameter tuning, we follow (Hsu et al., 2003). We use an RBF kernel and perform a grid search on the penalty parameter C and the kernel scale γ. C and γ are tuned over exponentially growing sequences, C = {2 5, 2 2, 21, 24, 27, 210, 213} and γ = {2 15, 2 12, 2 9, 2 6, 2 3, 20, 23}. We tune the number of neighbors for IG and Relief F over the grid k = {1, 3, 5, 10, 15, 20, 30, 50, 100}. For the Laplacian score, the samples kernel scale is tuned to the ith percentile of Euclidean distances over the grid i = {1, 5, 10, 30, 50, 70, 90, 95, 99}. |