Where to Pay Attention in Sparse Training for Feature Selection?

Authors: Ghada Sokar, Zahra Atashgahi, Mykola Pechenizkiy, Decebal Constantin Mocanu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We performed extensive experiments on 10 datasets of different types, including image, speech, text, artificial, and biological. They cover a wide range of characteristics, such as low and high-dimensional feature spaces, and few and large training samples.
Researcher Affiliation Academia Ghada Sokar Eindhoven University of Technology g.a.z.n.sokar@tue.nl Zahra Atashgahi University of Twente z.atashgahi@utwente.nl Mykola Pechenizkiy Eindhoven University of Technology m.pechenizkiy@tue.nl Decebal Constantin Mocanu University of Twente Eindhoven University of Technology d.c.mocanu@utwente.nl
Pseudocode Yes Algorithm 1 WAST
Open Source Code Yes Code is available at https://github.com/Ghada Sokar/WAST.
Open Datasets Yes We evaluate our method on 10 publicly available datasets, including image, speech, text, time series, biological, and artificial data. They have a variety of characteristics, such as low and high-dimensional features and a small and large number of training samples. Details are in Table 1. (Table 1 lists datasets like Madelon [24], USPS [32], MNIST [36] with citations).
Dataset Splits No The paper provides train and test splits in Table 1 (e.g., 'Train' and 'Test' columns). However, it does not explicitly mention or quantify a separate 'validation' split used for hyperparameter tuning or model selection, which is distinct from the training and testing sets.
Hardware Specification No NN-based and classical methods are trained on Nvidia GPUs and CPUs, respectively.
Software Dependencies No We implemented WAST and QS [4] with Py Torch [58]
Experiment Setup Yes For all NN-based methods except CAE [5], we use a single hidden layer of 200 neurons. The architecture of CAE consists of two layers. The size of the hidden layers is dependent on the chosen K; [K, 3 2K]. For WAST and QS, we use a sparsity level of 0.8. Following [4], we report the accuracy of NN-based baselines after 100 epochs unless stated otherwise. ... For WAST, we train the model for 10 epochs. Following [4], we add a Gaussian noise with a factor of 0.2 to the input in WAST and QS [4]. Details of the hyperparameters are in Appendix A.1.