The Contextual Lasso: Sparse Linear Models via Deep Neural Networks

Authors: Ryan Thompson, Amir Dezfouli, Robert Kohn

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso without sacrificing the predictive power of a standard deep neural network. The contextual lasso is evaluated here via experimentation on synthetic and real data.
Researcher Affiliation Collaboration Ryan Thompson University of New South Wales CSIRO s Data61 Amir Dezfouli BIMLOGIQ Robert Kohn University of New South Wales
Pseudocode Yes Algorithm 1 Projection onto ℓ1-ball and Algorithm 2 Projection onto group ℓ1-ball
Open Source Code Yes We implement the contextual lasso as described in this section in the Julia (Bezanson et al., 2017) package Contextual Lasso. ... Contextual Lasso is available at https://github.com/ryan-thompson/Contextual Lasso.jl.
Open Datasets Yes The datasets used throughout this paper are publicly available at the following URLs. House pricing data in Section 1: https://www.kaggle.com/datasets/ruiqurm/lianjia. Energy consumption data in Section 4: https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction. Parkinson s telemonitoring data in Section 4: https://archive.ics.uci.edu/ml/datasets/parkinsons+telemonitoring. News popularity data in Appendix J: https://archive.ics.uci.edu/ml/datasets/online+news+popularity.
Dataset Splits Yes The dataset, containing n = 19, 375 observations, is randomly split into training, validation, and testing sets in 0.6-0.2-0.2 proportions.
Hardware Specification Yes All experiments are run on a Linux platform with NVIDIA RTX 4090 GPUs.
Software Dependencies No The paper mentions software packages like Julia, Flux, GLMNet, grpreg, lassonet, LLSPIN, and Optuna, but does not provide specific version numbers for these dependencies, making full reproducibility difficult.
Experiment Setup Yes The network is configured with three hidden layers. The number of neurons... is set so that the dimensionality of the weights w is approximately 32 p m. ...These methods all use rectified linear activation functions in the hidden layers and are optimized using Adam (Kingma and Ba, 2015) with a learning rate of 0.001. Convergence is monitored on a validation set with the optimizer terminated after 30 iterations without improvement.