Vision Mamba Mender
Authors: Jiacong Hu, Anda Cao, Zunlei Feng, Shengxuming Zhang, Yi Wang, Lingxiang Jia, Mingli Song
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate the efficacy of the proposed methods on prevalent Mamba architectures, significantly enhancing Mamba s performance. For more information, please visit https://vision-mamba-mender.github.io/. |
| Researcher Affiliation | Academia | Jiacong Hu1,3, Anda Cao1, Zunlei Feng2,3,4 , Shengxuming Zhang2, Yi Wang1, Lingxiang Jia1, Mingli Song1,3,4 1College of Computer Science and Technology, Zhejiang University, 2School of Software Technology, Zhejiang University, 3State Key Laboratory of Blockchain and Data Security, Zhejiang University, 4Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security {jiaconghu,caoanda,zunleifeng}@zju.edu.cn, {zsxm1998,y_w,lingxiangjia,brooksong}@zju.edu.cn |
| Pseudocode | No | The paper describes mathematical equations for model components and computations (e.g., Eqn. 1-6) but does not present them within an explicitly labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | Yes | For more information, please visit https://vision-mamba-mender.github.io/. ... Additionally, in the appendix, we have supplemented more information about the Mamba architecture, related work on Mamba interpretability, detailed experimental settings, additional results on state flaw identification, additional results on state flaw repair, and detailed ablation experiments, as follows. A Origin of the Mamba Model ... To facilitate a better understanding of the value and significance of this work, as well as to thoroughly demonstrate the effectiveness and applicability of the proposed method, we have provided the algorithm code in supplementary material. This code will be made publicly available. |
| Open Datasets | Yes | Additionally, we conducted experiments on three scales of the Image Net [36] dataset: Image Net-50, Image Net-300, and Image Net-1K. To obtain the image foreground annotations m as defined in Definition 1 from the Image Net dataset, we utilized annotations from the Image Net-S dataset [37]. |
| Dataset Splits | Yes | When training on 224x224 input images, we optimized the model using Adam W [65] with a momentum of 0.9, a total batch size of 128, and a weight decay of 0.1. We utilized a cosine learning rate schedule with an initial learning rate of 5e-4, training the Mamba model for 300 epochs. In particular during the baseline model training, these training strategies resulted in an exceedingly smooth training curve in the later stages, effectively optimizing the fit. During testing, we performed center cropping on the validation set to extract 224x224 images. |
| Hardware Specification | Yes | Throughout the entire experiment, we utilized 8 NVIDIA A40 GPU cards and a CPU with 24 cores and 500GB of memory. |
| Software Dependencies | No | The paper mentions "Adam W [65]" as an optimizer but does not specify its version or any other software dependencies with version numbers (e.g., Python, PyTorch versions). It only states "Adam W" is used which implies a library, but no version. [65] is a paper on decoupled weight decay regularization, not a software library. |
| Experiment Setup | Yes | Model Parameter Settings. To enable the Mamba model to train efficiently within limited computational resources, we adjusted certain parameters across the Mamba models. For instance, in the case of VMamba-T, we set the patch size to 16x16. For Si MBA-S, the model depth was adjusted to [2, 3, 3, 2], and we introduced a class token for classification in the last block. For Efficient VMamba-T, we similarly set the patch size to 16x16. In the case of Local Vi M-T, we reduced the model depth from 20 to 9 and set the state dimensionality to 128. ... Model Training Settings. To ensure a fair comparison with limited resources, our training settings primarily followed the experimental setup of Dei T [43]. Specifically, we employed data augmentation techniques such as random cropping and random horizontal flipping. When training on 224x224 input images, we optimized the model using Adam W [65] with a momentum of 0.9, a total batch size of 128, and a weight decay of 0.1. We utilized a cosine learning rate schedule with an initial learning rate of 5e-4, training the Mamba model for 300 epochs. ... State Flaw Identification. For external state correlation analysis, the threshold α is set to 0.5 by default. For internal state correlation, the threshold β is set to 0.3 by default. ... State Flaw Repair. In our experiments, the balance weight λ for the loss function of external state flaw repair is set to 1e + 7 by default, and the balance weight γ for the loss function of internal state flaw repair is also set to 1e + 7 by default. Furthermore, based on the conclusions about flaw identification in the main text, external state flaw repair is applied by default to the first Mamba block, while internal state flaw repair is applied to the last Mamba block. |