Training Your Sparse Neural Network Better with Any Mask
Authors: Ajay Kumar Jaiswal, Haoyu Ma, Tianlong Chen, Ying Ding, Zhangyang Wang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report extensive experiments using variety of datasets, network architectures, and mask options. Incorporating our techniques in the sparse retraining immediately boosts the performance of sparse mask. |
| Researcher Affiliation | Academia | 1The University of Texas at Austin 2University of California, Irvine. |
| Pseudocode | No | The paper does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is at https://github.com/VITA-Group/To ST. |
| Open Datasets | Yes | By adopting our newly curated techniques, we demonstrate significant performance gains across various popular datasets (CIFAR-10, CIFAR-100, Tiny Image Net) and "Tiny Image Net (Deng et al., 2009)". |
| Dataset Splits | No | The paper mentions training on CIFAR-10, CIFAR-100, and Tiny Image Net, but does not explicitly provide the training/validation/test dataset splits (percentages or sample counts) within the text. |
| Hardware Specification | No | The paper does not explicitly provide specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'offical pytorch implementation' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For training, we adopt an SGD optimizer with momentum 0.9 and weight decay 2e 4. The initial learning rate is set to 0.1, and the networks are trained for 180 epochs with a batch size of 128. The learning rate decays by a factor of 10 at the 90th and 135th epoch during the training. |