Sequential Attention for Feature Selection

Authors: Taisuke Yasuda, Mohammadhossein Bateni, Lin Chen, Matthew Fahrbach, Gang Fu, Vahab Mirrokni

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work introduces the Sequential Attention algorithm for supervised feature selection... Empirically, Sequential Attention achieves state-of-the-art feature selection results for neural networks on standard benchmarks. The code for our algorithm and experiments is publicly available.1
Researcher Affiliation Collaboration Taisuke Yasuda* Carnegie Mellon University taisukey@cs.cmu.edu Mohammad Hossein Bateni, Lin Chen, Matthew Fahrbach, Gang Fu*, and Vahab Mirrokni Google Research {bateni,linche,fahrbach,thomasfu,mirrokni}@google.com
Pseudocode Yes Algorithm 1 Sequential Attention for feature selection. Algorithm 2 Orthogonal Matching Pursuit (Pati et al., 1993). Algorithm 3 Sequential LASSO (Luo & Chen, 2014).
Open Source Code Yes The code for our algorithm and experiments is publicly available.1 The code is available at: github.com/google-research/google-research/tree/master/sequential attention
Open Datasets Yes In these experiments, we consider six datasets used in experiments in Lemhadri et al. (2021); Balın et al. (2019), and select 𝑘= 50 features... Table 1: Statistics about benchmark datasets. Dataset # Examples # Features # Classes Type Mice Protein 1,080 77 8 Biology MNIST 60,000 784 10 Image MNIST-Fashion 60,000 784 10 Image ISOLET 7,797 617 26 Speech COIL-20 1,440 400 20 Image Activity 5,744 561 6 Sensor
Dataset Splits No The paper mentions 'test data' and 'prediction accuracies' but does not explicitly specify the training/test/validation dataset splits (e.g., percentages or counts) or reference predefined splits with citations for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU specifications, or cloud computing instance types.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes In these experiments, we consider six datasets used in experiments in Lemhadri et al. (2021); Balın et al. (2019), and select 𝑘= 50 features using a one-layer neural network with hidden width 67 and Re LU activation... Table 4: Epochs and batch size used to compare the efficiency of feature selection algorithms... For this experiment, we use a dense neural network with 768, 256, and 128 neurons in each of the three hidden layers with Re LU activations.