Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Demystify Mamba in Vision: A Linear Attention Perspective
Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For each design, we meticulously analyze its pros and cons, and empirically evaluate its impact on model performance in vision tasks. |
| Researcher Affiliation | Collaboration | Dongchen Han1 Ziyi Wang1 Zhuofan Xia1 Yizeng Han1 Yifan Pu1 Chunjiang Ge1 Jun Song2 Shiji Song1 Bo Zheng2 Gao Huang1 1 Tsinghua University 2 Alibaba Group |
| Pseudocode | No | No sections labeled "Pseudocode" or "Algorithm" are found. The paper uses equations and diagrams (Fig 3, 7) to describe methods. |
| Open Source Code | Yes | Code is available at https://github.com/Leap Lab THU/MLLA. |
| Open Datasets | Yes | Image Net-1K classification [8], COCO object detection [30], and ADE20K semantic segmentation [55]. |
| Dataset Splits | Yes | Image Net-1K dataset comprises 1.28 million training images and 50,000 validation images, encompassing 1,000 classes. |
| Hardware Specification | Yes | Speed tests on a RTX3090 GPU. |
| Software Dependencies | No | Specifically, we utilize Adam W [34] optimizer to train all our models from scratch for 300 epochs. We apply a cosine learning rate decay schedule... Augmentation and regularization strategies includes Rand Augment [6], Mixup [53], Cut Mix [52], and random erasing [54]. In the training of MILA models, MESA [11] is employed to prevent overfitting. No version numbers for software or libraries are mentioned. |
| Experiment Setup | Yes | Specifically, we utilize Adam W [34] optimizer to train all our models from scratch for 300 epochs. We apply a cosine learning rate decay schedule with a linear warm-up of 20 epochs and a weight decay of 0.05. The total batch size is 4096 and initial learning rate is set to 4 10 3. Augmentation and regularization strategies includes Rand Augment [6], Mixup [53], Cut Mix [52], and random erasing [54]. In the training of MILA models, MESA [11] is employed to prevent overfitting. |