The Contextual Lasso: Sparse Linear Models via Deep Neural Networks
Authors: Ryan Thompson, Amir Dezfouli, Robert Kohn
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso without sacrificing the predictive power of a standard deep neural network. The contextual lasso is evaluated here via experimentation on synthetic and real data. |
| Researcher Affiliation | Collaboration | Ryan Thompson University of New South Wales CSIRO s Data61 Amir Dezfouli BIMLOGIQ Robert Kohn University of New South Wales |
| Pseudocode | Yes | Algorithm 1 Projection onto ℓ1-ball and Algorithm 2 Projection onto group ℓ1-ball |
| Open Source Code | Yes | We implement the contextual lasso as described in this section in the Julia (Bezanson et al., 2017) package Contextual Lasso. ... Contextual Lasso is available at https://github.com/ryan-thompson/Contextual Lasso.jl. |
| Open Datasets | Yes | The datasets used throughout this paper are publicly available at the following URLs. House pricing data in Section 1: https://www.kaggle.com/datasets/ruiqurm/lianjia. Energy consumption data in Section 4: https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction. Parkinson s telemonitoring data in Section 4: https://archive.ics.uci.edu/ml/datasets/parkinsons+telemonitoring. News popularity data in Appendix J: https://archive.ics.uci.edu/ml/datasets/online+news+popularity. |
| Dataset Splits | Yes | The dataset, containing n = 19, 375 observations, is randomly split into training, validation, and testing sets in 0.6-0.2-0.2 proportions. |
| Hardware Specification | Yes | All experiments are run on a Linux platform with NVIDIA RTX 4090 GPUs. |
| Software Dependencies | No | The paper mentions software packages like Julia, Flux, GLMNet, grpreg, lassonet, LLSPIN, and Optuna, but does not provide specific version numbers for these dependencies, making full reproducibility difficult. |
| Experiment Setup | Yes | The network is configured with three hidden layers. The number of neurons... is set so that the dimensionality of the weights w is approximately 32 p m. ...These methods all use rectified linear activation functions in the hidden layers and are optimized using Adam (Kingma and Ba, 2015) with a learning rate of 0.001. Convergence is monitored on a validation set with the optimizer terminated after 30 iterations without improvement. |