Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
LBMamba: Locally Bi-directional Mamba
Authors: Jingwei Zhang, Xi Han, Hong Qin, Mahdi S. Hosseini, Dimitris Samaras
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the versatility of our approach on both natural images and whole slide images (WSIs). We show that our LBVim constantly offers a superior performance throughput trade-off. That is under the same throughput, LBVim achieves 0.8% to 1.6% higher top1 accuracy on the Image Net-1K classification dataset, 0.6% to 2.7 % higher m Io U on the ADE20K semantic segmentation dataset, 0.9% higher APb and 1.1% higher APm on the COCO detection dataset. |
| Researcher Affiliation | Academia | Jingwei Zhang EMAIL Department of Computer Science, Stony Brook University, Stony Brook, NY, USA Xi Han EMAIL Department of Computer Science, Stony Brook University, Stony Brook, NY, USA Hong Qin EMAIL Department of Computer Science, Stony Brook University, Stony Brook, NY, USA Mahdi S. Hosseini EMAIL Concordia University, Montreal, Canada Mila Quebec AI Institute, Montreal, Canada Dimitris Samaras EMAIL Department of Computer Science, Stony Brook University, Stony Brook, NY, USA |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It describes methodologies using mathematical equations and textual explanations. |
| Open Source Code | Yes | Our code is available at https://github.com/cvlab-stonybrook/LBMamba. |
| Open Datasets | Yes | We first evaluate LBVim on the Image Net-1K dataset (Deng et al., 2009)... We evaluate the performance of LBVim on downstream tasks, including semantic segmentation on ADE20K dataset (Zhou et al., 2019), and object detection and instance segmentation on the COCO 2017 dataset (Lin et al., 2014)... We evaluate them on 3 public available Whole Slide Image datasets, PANDA (prostate grade assessment) (Bulten et al., 2022), TCGA-NSCLC (adenocarcinoma vs. squamous lung cancer) and TCGA-BRCA (breast invasive carcinoma sub-typing) (tcg). |
| Dataset Splits | Yes | All natural image models are trained on the training set, and top-1 accuracy on the validation set is reported. For fair comparisons, we follow the training setting in (Zhu et al., 2024). To be specific, we train our models for 300 epochs using a batch size of 1,024 for tiny model and a batch size of 512 otherwise... Image Net-1K dataset (Deng et al., 2009), which contains 1.28M training images and 50K validation images from 1,000 categories. |
| Hardware Specification | Yes | All natural image models are trained on 4/2 Nvidia A100/H100 GPUs. All throughput, GPU memory consumption and pathology models are run on a Nvidia Quadro RTX 8000 GPU. |
| Software Dependencies | No | The paper mentions using MMSegmenation (Contributors, 2020) and MMDetection (Chen et al., 2019) libraries but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | To be specific, we train our models for 300 epochs using a batch size of 1,024 for tiny model and a batch size of 512 otherwise. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with a momentum of 0.9, a cosine annealing learning with an initial value of 1 10 3, a 5-epoch warmup period and a weight decay of 0.05. For data augmentation, we apply standard techniques: random cropping, horizontal flipping, label-smoothing regularization, mixup, and random erasing. |