Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

Authors: Zhangheng Li, Shiwei Liu, Tianlong Chen, Ajay Kumar Jaiswal, Zhenyu Zhang, Dilin Wang, Raghuraman Krishnamoorthi, Shiyu Chang, Zhangyang Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results on image classification, object detection, and instance segmentation illustrate the favorable effectiveness and flexibility of Sparse Cocktail, pointing to a promising direction for sparse co-training.
Researcher Affiliation Collaboration 1University of Texas at Austin 2Eindhoven University of Technology 3University of Oxford 4University of North Carolina at Chapel Hill 5Meta 6University of California, Santa Barbara.
Pseudocode Yes The nuances of our interpolation method are detailed in Algorithm 1 in the Appendix.
Open Source Code Yes Code is available at github.com/VITA-Group/Sparse Cocktail.
Open Datasets Yes We conduct experiments on CIFAR10 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009).
Dataset Splits Yes The determination of αk is grounded in a hold-out validation set, which we term as Val Acc. ... Hold-out validation set Last 5% of training set (Table 5 and 6).
Hardware Specification Yes We first showcase the inference FLOPs and GPU memory consumption on a Nvidia A100 GPU... The total training time on CIFAR10 dataset on an NVIDIA A6000 GPU.
Software Dependencies No The paper mentions software components like 'SGD', 'Cyclic Cosine Decay' and links to a GitHub for the latter, but it does not explicitly state version numbers for any software dependencies (e.g., Python, PyTorch, specific library versions).
Experiment Setup Yes Table 5. The hyperparameter setting of Sparse Cocktail on CIFAR10 dataset. Hyperparameter Configuration IMP setting iterations = 10, rewind epoch = 7, training epochs = 150 Optimizer SGD (lr = 0.1, momentum = 0.9, weight decay = 1e 4) LR scheduler Cyclic Cosine Decay