MambaTree: Tree Topology is All You Need in State Space Model
Authors: Yicheng Xiao, Lin Song, shaoli huang, Jiangshan Wang, Siyu Song, Yixiao Ge, Xiu Li, Ying Shan
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method significantly outperforms existing structured state space models on image classification, object detection and segmentation. Besides, by fine-tuning large language models, our approach achieves consistent improvements in multiple textual tasks at minor training cost. |
| Researcher Affiliation | Collaboration | 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2ARC Lab, Tencent PCG 3Tencent AI Lab 4South China Normal University |
| Pseudocode | Yes | Algorithm 1 Vision Tree Scanning |
| Open Source Code | Yes | Code is available at https://github.com/Eason Xiao-888/Groot VL. |
| Open Datasets | Yes | We assess the classification performance of Mamba Tree V on the Image Net-1k dataset [12]. Following previous practices [43, 44, 62, 41], all Mamba Tree V models are trained for 300 epochs from scratch using Adam W optimizer with a warm-up strategy of 20 epochs. We verify the detection performance of Mamba Tree V on the MSCOCO 2017 dataset [39]. To evaluate the semantic segmentation performance of our Mamba Tree V series, we train our models with Uper Net [65] initialized by pre-trained classification weights on ADE20K[75]. We regard Mamba [19] with 130M parameters as the base model. ... we first fine-tune pre-trained Mamba via Lo RA [33] and Mamba Tree L under the same setting with the Alpaca data [58], which contains 52000 instruction tuning data for supervised fine-tuning. |
| Dataset Splits | Yes | The comparison results summarized in Table 1 show Mamba Tree V leading all SSM-based models and competitive with advanced CNNs and Transformers across tiny, small, and base scales. Specifically, Mamba Tree V-T achieves 83.4% Top-1 Acc. boosting Vi M-S by 2.9%, Local Vim-S by 2.2%, Plain Mamba-L2 by 1.8% and VMamba-T by 0.9% with similar FLOPs. Additionally, it surpasses Conv Ne Xt-T by 1.3% and Swin-T by 2.2%, demonstrating the effectiveness of our method. We assess the classification performance of Mamba Tree V on the Image Net-1k dataset [12]. |
| Hardware Specification | Yes | As shown in Table 7, we report the inference throughputs of our method on an Nvidia V100 GPU. The models are trained with thirty-two 32GB V100 GPUs by default. The models are trained with eight 32GB V100 GPUs by default. |
| Software Dependencies | No | The paper mentions 'Adam W optimizer', 'MMDetection library', 'Uper Net', and 'lm-evaluation-harness project', but does not specify their version numbers. |
| Experiment Setup | Yes | All Mamba Tree V models are trained for 300 epochs from scratch using Adam W optimizer with a warm-up strategy of 20 epochs. During training, we utilize a Cosine Scheduler with an initial learning rate of 1 10 3 and weight decay of 0.05. In addition, the exponential moving average (EMA) is also applied. We adopt the Adam W optimizer with a learning rate of 1 10 4 and batch size of 16. The training schedules include 1 (12 epochs) and 3 (36 epochs) with multi-scale data augmentation. |