Exploring Token Pruning in Vision State Space Models
Authors: Zheng Zhan, Zhenglun Kong, Yifan Gong, Yushu Wu, Zichong Meng, Hangyu Zheng, Xuan Shen, Stratis Ioannidis, Wei Niu, Pu Zhao, Yanzhi Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our approach can achieve significant computation reduction with minimal impact on performance across different tasks. We conduct comprehensive experiments on Image Net-1K[4], COCO 2017 [20] and ADE20K [45] datasets. |
| Researcher Affiliation | Academia | 1Northeastern University, 2Harvard University, 3University of Georgia |
| Pseudocode | Yes | A pseudo-code for our pruning-aware hidden state alignment is demonstrated in Appendix A. Algorithm 1: PRUNING-AWARE HIDDEN STATE ALIGNMENT |
| Open Source Code | Yes | Code available at https://github.com/ZLKong/To P-Vi M |
| Open Datasets | Yes | We conduct comprehensive experiments on Image Net-1K[4], COCO 2017 [20] and ADE20K [45] datasets. |
| Dataset Splits | Yes | The COCO 2017 dataset contains 118K images for training, 5K images for validating, and 20K images for testing. |
| Hardware Specification | Yes | All experiments are conducted on 4 NVIDIA V100s. |
| Software Dependencies | No | The paper mentions the use of specific datasets and models (Vi M, Plain Mamba) and training frameworks (Mask R-CNN, Retina Net, Uper Net), but it does not specify version numbers for any software components (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | For Vi M, we set a patch extraction stride of 8 while keeping the patch size unchanged, a constant learning rate of 10 5, and a weight decay of 10 8. For Plain Mamba, we use a warm-up period of 5 epochs. The weight decay is set to 1e-8, the base learning rate to 2e-5, the warm-up learning rate to 2e-8, and the minimum learning rate to 2e-7. |