Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LassoNet: A Neural Network with Feature Sparsity

Authors: Ismael Lemhadri, Feng Ruan, Louis Abraham, Robert Tibshirani

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply Lasso Net to a number of real-data problems and find that it significantly outperforms state-of-the-art methods for feature selection and regression. ... In this section, we show experimental results on real-world datasets.
Researcher Affiliation Collaboration Ismael Lemhadri EMAIL Department of Statistics, Stanford University, Stanford, U.S.A. ... Feng Ruan EMAIL Department of Statistics, University of California, Berkeley, USA ... Louis Abraham EMAIL Gematria Technologies, London, U.K. ... Robert Tibshirani EMAIL Departments of Biomedical Data Sciences, and Statistics, Stanford University, Stanford, U.S.A.
Pseudocode Yes The procedure is summarized in Alg. 1. ... The key novelty is a numerically efficient algorithm for the proximal inner loop. We call the proposed algorithm Hier-Prox and detail it in Alg. 2. ... Algorithm 3 Training Lasso Net for Unsupervised Feature Selection ... Algorithm 4 Group Hierarchical Proximal Operator ... Algorithm 5 Lasso Net for Matrix Completion
Open Source Code Yes We have made the code for our algorithm and experiments available on a public website 1. 1. https://lassonet.ml ... Python code and documentation for Lasso Net is available at https://lassonet.ml, and R code will soon be available in the same website.
Open Datasets Yes Mice Protein Dataset consists of protein expression levels measured in the cortex of normal and trisomic mice who had been exposed to different experimental conditions. ... (Higuera et al., 2015) ... MNIST and MNIST-Fashion consist of 28-by-28 grayscale images of hand-written digits and clothing items, respectively. ... ISOLET consists of preprocessed speech data ... COIL-20 consists of centered grayscale images of 20 objects. ... Smartphone Dataset for Human Activity Recognition consists of sensor data ... The remaining datasets were retrieved from the UCI Repository (Dua and Graff, 2017).
Dataset Splits Yes We divide each data set randomly into train, validation and test with a 70-10-20 split.
Hardware Specification Yes All experiments were run on a single computer with NVIDIA Tesla K80 and Intel Xeon E5-2640.
Software Dependencies No The implementation was conducted in the Py Torch framework.
Experiment Setup Yes For all of the experiments, we use Adam optimizer with a learning rate of 10 3 to train the initial dense model. Then, we use vanilla gradient descent with momentum equal to 0.9 on the regularization path. ... We used a learning rate of 0.001 and early stopping criterion of 10. Although the hierarchy parameter could in principle be selected on a validation set as well, we have found that the default value M = 10 works well for a variety of datasets. The number of neurons in the hidden layer was varied within [d/3, 2d/3, d, 4d/3].