QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model
Authors: Fei Xie, Weijia Zhang, Zhongdao Wang, Chao Ma
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Quad Mamba achieves state-of-the-art performance in various vision tasks, including image classification, object detection, instance segmentation, and semantic segmentation. and We conduct experiments on commonly used benchmarks, including Image Net-1k [29] for image classification, MS COCO2017 [37] for object detection and instance segmentation, and ADE20K [78] for semantic segmentation. |
| Researcher Affiliation | Collaboration | Fei Xie1 Weijia Zhang1 Zhongdao Wang2 Chao Ma1 1 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2 Huawei Noah s Ark Lab |
| Pseudocode | Yes | In Sec.A.4, we also provide the pseudo-code to help understand the key operations within the Quad VSS block. and Algorithm 1 Py Torch code of Quad VSS block, Algorithm 2 Py Torch code of Quadtree window partition at two levels, Algorithm 3 Py Torch code of Quadtree window restoration at two levels, Algorithm 4 Py Torch code of differentiable sequence masking. |
| Open Source Code | Yes | The code is in https://github.com/VISION-SJTU/Quad Mamba. |
| Open Datasets | Yes | We conduct experiments on commonly used benchmarks, including Image Net-1k [29] for image classification, MS COCO2017 [37] for object detection and instance segmentation, and ADE20K [78] for semantic segmentation. |
| Dataset Splits | Yes | Image Net [29] is widely recognized as the standard for image classification benchmarks, consisting of around 1.3 million training images and 50,000 validation images spread across 1,000 classes. |
| Hardware Specification | Yes | Our models are implemented with Py Torch and Timm libraries and trained on A800 GPUs. and Measurements are taken with an A800 GPU. |
| Software Dependencies | No | Our models are implemented with Py Torch and Timm libraries and trained on A800 GPUs. (Does not specify version numbers for PyTorch or Timm.) |
| Experiment Setup | Yes | The data augmentation techniques used include random resized crop (input image size of 224x224), horizontal flip, Rand Augment [77], Mixup [70], Cut Mix [69], Random Erasing [77], and color jitter. Additionally, regularization techniques such as weight decay, stochastic depth [24], and label smoothing [56] are applied. All models are trained using Adam W [45]. The learning rate scaling rule is calculated as Batch Size / 1024 * 10^-3. and The learning rate is set as 6 x 10^-5. The fine-tuning process consists of a total of 160,000 iterations with a batch size of 16. |