spred: Solving L1 Penalty with SGD
Authors: Liu Ziyin, Zihao Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The foremost contribution of our work is to theoretically prove and empirically demonstrate that a reparametrization trick... Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the L1penalty have been unsuccessful. |
| Researcher Affiliation | Academia | 1The University of Tokyo 2HKUST. |
| Pseudocode | Yes | For completeness, we give an explicit algorithm in Algorithm 1 and 2. |
| Open Source Code | Yes | Code: https://github.com/zihao-wang/spred |
| Open Datasets | Yes | We compare with relevant baselines on 6 public cancer classification datasets based on microarray gene expression features from the Gene Expression Omnibus... The datasets are taken from the public datasets of Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/ |
| Dataset Splits | Yes | Because the dataset size is small, for each run of each model, we randomly pick 20% samples as the test set, 20% as the validation set for hyperparameter tuning, and 60% as the training set. |
| Hardware Specification | No | The paper discusses model architectures (ResNet50, MobileNet V1) and training batch sizes in relation to memory cost, but it does not specify the exact hardware (e.g., GPU models, CPU types, or memory amounts) used for the experiments. |
| Software Dependencies | No | The paper mentions that the method is easy to implement in 'any modern deep-learning framework' but does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The learning rate is chosen from {1,0.1,0.01,0.001}. The final result is chosen from the best setting... The learning rate and κ are both selected from {7e-1, 5e-1, 3e-1, 1e-1, 5e-2, 3e-2, 1e-2}. |