reproducibilityindex.ai

QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model

Authors: Fei Xie, Weijia Zhang, Zhongdao Wang, Chao Ma

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Quad Mamba achieves state-of-the-art performance in various vision tasks, including image classiﬁcation, object detection, instance segmentation, and semantic segmentation. and We conduct experiments on commonly used benchmarks, including Image Net-1k [29] for image classiﬁcation, MS COCO2017 [37] for object detection and instance segmentation, and ADE20K [78] for semantic segmentation.
Researcher Affiliation	Collaboration	Fei Xie1 Weijia Zhang1 Zhongdao Wang2 Chao Ma1 1 Mo E Key Lab of Artiﬁcial Intelligence, AI Institute, Shanghai Jiao Tong University 2 Huawei Noah s Ark Lab
Pseudocode	Yes	In Sec.A.4, we also provide the pseudo-code to help understand the key operations within the Quad VSS block. and Algorithm 1 Py Torch code of Quad VSS block, Algorithm 2 Py Torch code of Quadtree window partition at two levels, Algorithm 3 Py Torch code of Quadtree window restoration at two levels, Algorithm 4 Py Torch code of differentiable sequence masking.
Open Source Code	Yes	The code is in https://github.com/VISION-SJTU/Quad Mamba.
Open Datasets	Yes	We conduct experiments on commonly used benchmarks, including Image Net-1k [29] for image classiﬁcation, MS COCO2017 [37] for object detection and instance segmentation, and ADE20K [78] for semantic segmentation.
Dataset Splits	Yes	Image Net [29] is widely recognized as the standard for image classiﬁcation benchmarks, consisting of around 1.3 million training images and 50,000 validation images spread across 1,000 classes.
Hardware Specification	Yes	Our models are implemented with Py Torch and Timm libraries and trained on A800 GPUs. and Measurements are taken with an A800 GPU.
Software Dependencies	No	Our models are implemented with Py Torch and Timm libraries and trained on A800 GPUs. (Does not specify version numbers for PyTorch or Timm.)
Experiment Setup	Yes	The data augmentation techniques used include random resized crop (input image size of 224x224), horizontal ﬂip, Rand Augment [77], Mixup [70], Cut Mix [69], Random Erasing [77], and color jitter. Additionally, regularization techniques such as weight decay, stochastic depth [24], and label smoothing [56] are applied. All models are trained using Adam W [45]. The learning rate scaling rule is calculated as Batch Size / 1024 * 10^-3. and The learning rate is set as 6 x 10^-5. The ﬁne-tuning process consists of a total of 160,000 iterations with a batch size of 16.