Training Your Sparse Neural Network Better with Any Mask

Authors: Ajay Kumar Jaiswal, Haoyu Ma, Tianlong Chen, Ying Ding, Zhangyang Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report extensive experiments using variety of datasets, network architectures, and mask options. Incorporating our techniques in the sparse retraining immediately boosts the performance of sparse mask.
Researcher Affiliation Academia 1The University of Texas at Austin 2University of California, Irvine.
Pseudocode No The paper does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code is at https://github.com/VITA-Group/To ST.
Open Datasets Yes By adopting our newly curated techniques, we demonstrate significant performance gains across various popular datasets (CIFAR-10, CIFAR-100, Tiny Image Net) and "Tiny Image Net (Deng et al., 2009)".
Dataset Splits No The paper mentions training on CIFAR-10, CIFAR-100, and Tiny Image Net, but does not explicitly provide the training/validation/test dataset splits (percentages or sample counts) within the text.
Hardware Specification No The paper does not explicitly provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions using 'offical pytorch implementation' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes For training, we adopt an SGD optimizer with momentum 0.9 and weight decay 2e 4. The initial learning rate is set to 0.1, and the networks are trained for 180 epochs with a batch size of 128. The learning rate decays by a factor of 10 at the 90th and 135th epoch during the training.