Exploring Token Pruning in Vision State Space Models

Authors: Zheng Zhan, Zhenglun Kong, Yifan Gong, Yushu Wu, Zichong Meng, Hangyu Zheng, Xuan Shen, Stratis Ioannidis, Wei Niu, Pu Zhao, Yanzhi Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our approach can achieve significant computation reduction with minimal impact on performance across different tasks. We conduct comprehensive experiments on Image Net-1K[4], COCO 2017 [20] and ADE20K [45] datasets.
Researcher Affiliation Academia 1Northeastern University, 2Harvard University, 3University of Georgia
Pseudocode Yes A pseudo-code for our pruning-aware hidden state alignment is demonstrated in Appendix A. Algorithm 1: PRUNING-AWARE HIDDEN STATE ALIGNMENT
Open Source Code Yes Code available at https://github.com/ZLKong/To P-Vi M
Open Datasets Yes We conduct comprehensive experiments on Image Net-1K[4], COCO 2017 [20] and ADE20K [45] datasets.
Dataset Splits Yes The COCO 2017 dataset contains 118K images for training, 5K images for validating, and 20K images for testing.
Hardware Specification Yes All experiments are conducted on 4 NVIDIA V100s.
Software Dependencies No The paper mentions the use of specific datasets and models (Vi M, Plain Mamba) and training frameworks (Mask R-CNN, Retina Net, Uper Net), but it does not specify version numbers for any software components (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes For Vi M, we set a patch extraction stride of 8 while keeping the patch size unchanged, a constant learning rate of 10 5, and a weight decay of 10 8. For Plain Mamba, we use a warm-up period of 5 epochs. The weight decay is set to 1e-8, the base learning rate to 2e-5, the warm-up learning rate to 2e-8, and the minimum learning rate to 2e-7.