Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Boosting Vision State Space Model with Fractal Scanning
Authors: Haoke Xiao, Lv Tang, Peng-tao Jiang, Hao Zhang, Jinwei Chen, Bo Li
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a series of experiments to evaluate and compare Fractal Mamba with different established benchmark models, including architectures based on CNNs, Vi Ts and SSMs. Our assessment covers a range of visual tasks with various resolution, such as image classification, object detection, remote sensing binary change object detection, and semantic segmentation. |
| Researcher Affiliation | Industry | vivo Mobile Communication Co., Ltd, Shanghai, China EMAIL |
| Pseudocode | No | The paper includes mathematical equations and descriptions of methods but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code, nor does it provide any repository links or mention of code in supplementary materials. |
| Open Datasets | Yes | We evaluate the performance of Fractal Mamba using the Image Net-1K dataset (Deng et al. 2009)... We evaluate the performance of Fractal Mamba on object detection using the MSCOCO 2017 (Lin et al. 2014)... We conduct experiments on the building CD dataset LEVIR-CD+, which is advanced version of the LEVIR-CD... The results of semantic segmentation on the ADE20K dataset are summarized in Table. 4. |
| Dataset Splits | No | The paper mentions using well-known datasets and referring to established evaluation protocols or prior works for training. However, it does not explicitly provide specific dataset split information (e.g., percentages or sample counts for training, validation, and testing) within the text for any of the datasets used. |
| Hardware Specification | Yes | All experiments are conducted using 8 NVIDIA H800 GPUs. |
| Software Dependencies | No | The paper mentions the use of 'AdamW optimizer' and 'MMDetection library', but it does not specify any version numbers for these software components or any other key libraries/frameworks. |
| Experiment Setup | Yes | The Fractal Mamba-T model is trained from scratch over 300 epochs, with an initial 20-epoch warm-up period, using a batch size of 1024. The training processes utilizes the Adam W optimizer (Loshchilov and Hutter 2017), with betas set to (0.9,0.999), momentum of 0.9, a cosine decay learning rate scheduler, an initial learning rate of 1 10 3, and a weight decay of 0.05. Additional strategies such as label smoothing (0.1) and exponential moving average (EMA) were incorporated into the training regimen. |