Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training

Authors: Zizheng Huang, Haoxing Chen, Jiaqi Li, Jun Lan, Huijia Zhu, Weiqiang Wang, Limin Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments to evaluate Vim training, exploring non-hierarchical models trained via supervised classification and pre-training paradigms, assessing their downstream task performance, and performing detailed algorithm analysis through ablation studies.
Researcher Affiliation Collaboration 1State Key Lab of Novel Software Technology, Nanjing University 2Shanghai Innovation Institute 3Independent Researcher 4China Mobile Research Institute 5Shanghai AI Lab. Correspondence to: Limin Wang <EMAIL>.
Pseudocode Yes Algorithm 1 Layer-Wise Shuffle forward
Open Source Code Yes Code and models are available at the open source URL.
Open Datasets Yes For supervised training, we train from scratch on Image Net1K (Deng et al., 2009), which contains 1.28 million samples for the classification task. We conduct semantic segmentation experiments on ADE20K, detection and instance segmentation on COCO2017 benchmark.
Dataset Splits Yes For supervised training, we train from scratch on Image Net1K (Deng et al., 2009), which contains 1.28 million samples for the classification task. For segmentation experiment, we adopt the UPer Net (Xiao et al., 2018) head on Image Net-1K trained models. For downstream object detection and instance segmentation tasks, we follow previous work to evaluate our method. The Mask R-CNN (He et al., 2017) structure is adopted with 1 schedule for 12-epoch fine-tuning.
Hardware Specification No No specific hardware details (like GPU/CPU models) were mentioned for running experiments. The paper only discusses computational overhead with throughput measurements at various resolutions.
Software Dependencies No The paper mentions 'Adam W optimizer' and 'Py Torch pseudo-code' but does not provide specific version numbers for these or any other software components.
Experiment Setup Yes Table A.1: Supervised training implementation settings. Table A.2: Pre-training training implementation settings.