Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model
Authors: Fei Xie, Weijia Zhang, Zhongdao Wang, Chao Ma
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Quad Mamba achieves state-of-the-art performance in various vision tasks, including image classification, object detection, instance segmentation, and semantic segmentation. and We conduct experiments on commonly used benchmarks, including Image Net-1k [29] for image classification, MS COCO2017 [37] for object detection and instance segmentation, and ADE20K [78] for semantic segmentation. |
| Researcher Affiliation | Collaboration | Fei Xie1 Weijia Zhang1 Zhongdao Wang2 Chao Ma1 1 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2 Huawei Noah s Ark Lab |
| Pseudocode | Yes | In Sec.A.4, we also provide the pseudo-code to help understand the key operations within the Quad VSS block. and Algorithm 1 Py Torch code of Quad VSS block, Algorithm 2 Py Torch code of Quadtree window partition at two levels, Algorithm 3 Py Torch code of Quadtree window restoration at two levels, Algorithm 4 Py Torch code of differentiable sequence masking. |
| Open Source Code | Yes | The code is in https://github.com/VISION-SJTU/Quad Mamba. |
| Open Datasets | Yes | We conduct experiments on commonly used benchmarks, including Image Net-1k [29] for image classification, MS COCO2017 [37] for object detection and instance segmentation, and ADE20K [78] for semantic segmentation. |
| Dataset Splits | Yes | Image Net [29] is widely recognized as the standard for image classification benchmarks, consisting of around 1.3 million training images and 50,000 validation images spread across 1,000 classes. |
| Hardware Specification | Yes | Our models are implemented with Py Torch and Timm libraries and trained on A800 GPUs. and Measurements are taken with an A800 GPU. |
| Software Dependencies | No | Our models are implemented with Py Torch and Timm libraries and trained on A800 GPUs. (Does not specify version numbers for PyTorch or Timm.) |
| Experiment Setup | Yes | The data augmentation techniques used include random resized crop (input image size of 224x224), horizontal flip, Rand Augment [77], Mixup [70], Cut Mix [69], Random Erasing [77], and color jitter. Additionally, regularization techniques such as weight decay, stochastic depth [24], and label smoothing [56] are applied. All models are trained using Adam W [45]. The learning rate scaling rule is calculated as Batch Size / 1024 * 10^-3. and The learning rate is set as 6 x 10^-5. The fine-tuning process consists of a total of 160,000 iterations with a batch size of 16. |