reproducibilityindex.ai

Demystify Mamba in Vision: A Linear Attention Perspective

Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For each design, we meticulously analyze its pros and cons, and empirically evaluate its impact on model performance in vision tasks.
Researcher Affiliation	Collaboration	Dongchen Han1 Ziyi Wang1 Zhuofan Xia1 Yizeng Han1 Yifan Pu1 Chunjiang Ge1 Jun Song2 Shiji Song1 Bo Zheng2 Gao Huang1 1 Tsinghua University 2 Alibaba Group
Pseudocode	No	No sections labeled "Pseudocode" or "Algorithm" are found. The paper uses equations and diagrams (Fig 3, 7) to describe methods.
Open Source Code	Yes	Code is available at https://github.com/Leap Lab THU/MLLA.
Open Datasets	Yes	Image Net-1K classification [8], COCO object detection [30], and ADE20K semantic segmentation [55].
Dataset Splits	Yes	Image Net-1K dataset comprises 1.28 million training images and 50,000 validation images, encompassing 1,000 classes.
Hardware Specification	Yes	Speed tests on a RTX3090 GPU.
Software Dependencies	No	Specifically, we utilize Adam W [34] optimizer to train all our models from scratch for 300 epochs. We apply a cosine learning rate decay schedule... Augmentation and regularization strategies includes Rand Augment [6], Mixup [53], Cut Mix [52], and random erasing [54]. In the training of MILA models, MESA [11] is employed to prevent overfitting. No version numbers for software or libraries are mentioned.
Experiment Setup	Yes	Specifically, we utilize Adam W [34] optimizer to train all our models from scratch for 300 epochs. We apply a cosine learning rate decay schedule with a linear warm-up of 20 epochs and a weight decay of 0.05. The total batch size is 4096 and initial learning rate is set to 4 10 3. Augmentation and regularization strategies includes Rand Augment [6], Mixup [53], Cut Mix [52], and random erasing [54]. In the training of MILA models, MESA [11] is employed to prevent overfitting.