Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once
Authors: Zhangheng Li, Shiwei Liu, Tianlong Chen, Ajay Kumar Jaiswal, Zhenyu Zhang, Dilin Wang, Raghuraman Krishnamoorthi, Shiyu Chang, Zhangyang Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results on image classification, object detection, and instance segmentation illustrate the favorable effectiveness and flexibility of Sparse Cocktail, pointing to a promising direction for sparse co-training. |
| Researcher Affiliation | Collaboration | 1University of Texas at Austin 2Eindhoven University of Technology 3University of Oxford 4University of North Carolina at Chapel Hill 5Meta 6University of California, Santa Barbara. |
| Pseudocode | Yes | The nuances of our interpolation method are detailed in Algorithm 1 in the Appendix. |
| Open Source Code | Yes | Code is available at github.com/VITA-Group/Sparse Cocktail. |
| Open Datasets | Yes | We conduct experiments on CIFAR10 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | The determination of αk is grounded in a hold-out validation set, which we term as Val Acc. ... Hold-out validation set Last 5% of training set (Table 5 and 6). |
| Hardware Specification | Yes | We first showcase the inference FLOPs and GPU memory consumption on a Nvidia A100 GPU... The total training time on CIFAR10 dataset on an NVIDIA A6000 GPU. |
| Software Dependencies | No | The paper mentions software components like 'SGD', 'Cyclic Cosine Decay' and links to a GitHub for the latter, but it does not explicitly state version numbers for any software dependencies (e.g., Python, PyTorch, specific library versions). |
| Experiment Setup | Yes | Table 5. The hyperparameter setting of Sparse Cocktail on CIFAR10 dataset. Hyperparameter Configuration IMP setting iterations = 10, rewind epoch = 7, training epochs = 150 Optimizer SGD (lr = 0.1, momentum = 0.9, weight decay = 1e 4) LR scheduler Cyclic Cosine Decay |