Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GaRA-SAM: Robustifying Segment Anything Model with Gated-Rank Adaptation

Authors: Sohyun Lee, Yeho Gwon, Lukas Hoyer, Suha Kwak

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model, Ga RA-SAM, significantly outperforms prior work on all robust segmentation benchmarks. In particular, it surpasses the previous best Io U score by up to 21.3%p on ACDC, a challenging real corrupted image dataset.
Researcher Affiliation	Collaboration	Sohyun Lee1 Yeho Gwon1 Lukas Hoyer2 Suha Kwak1 1POSTECH 2Google
Pseudocode	No	The paper describes the proposed method and gating process using figures (Figure 3) and mathematical equations (e.g., Equation 2, 3, 4, 5, 6, 7), but it does not present a clearly labeled pseudocode or algorithm block.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We do not disclose the code and data at the moment.
Open Datasets	Yes	For training and validation, we utilize the Robust-Seg dataset [8], which is constructed by applying 15 types of synthetic corruptions to three semantic segmentation benchmarks: LVIS [15], MSRA-10K [9], and Thin Objects-5K [37], comprising a total of 26,000 masks. For evaluation, we use five clear-condition image segmentation benchmarks: LVIS, MSRA-10K, STREETS [51], NDD20 [55], COCO [40]. Also, we test on a real-world corrupted benchmark including BDD100K [61] and LIS [5]. For training Ga RA-SAM on real-world data, we use BDD-100K and LIS for training, and evaluate on BDD-100K and LIS, and ACDC [47]. Note that all these datasets are free from licensing issues.
Dataset Splits	No	For training and validation, we utilize the Robust-Seg dataset [8], which is constructed by applying 15 types of synthetic corruptions to three semantic segmentation benchmarks: LVIS [15], MSRA-10K [9], and Thin Objects-5K [37], comprising a total of 26,000 masks. For evaluation, we use five clear-condition image segmentation benchmarks: LVIS, MSRA-10K, STREETS [51], NDD20 [55], COCO [40]. Also, we test on a real-world corrupted benchmark including BDD100K [61] and LIS [5]. For training Ga RA-SAM on real-world data, we use BDD-100K and LIS for training, and evaluate on BDD-100K and LIS, and ACDC [47]. While datasets are assigned to training/validation or evaluation, specific split percentages or sample counts for these partitions are not provided.
Hardware Specification	Yes	All training and evaluation experiments were conducted at POSTECH. Resource. We use 8 A6000 GPUs for training each method and 1 A6000 GPU for evaluation.
Software Dependencies	No	The models are optimized by Adam [27] with a learning rate of 1 10 4 for Vi T-B and 1 10 5 for Vi T-L, a weight decay of 1 10 5, and input batches of size 8, using both point and box prompts. The gating modules are trained with the same learning rate. We set the lowerand higher-rank dimensions as r L = 16 and r H = 256, respectively, and use a Gumbel-Sigmoid temperature of 0.5. Following Robust SAM [8], we use a combined segmentation loss, Lseg = Ldice [53] + Lfocal [41]. While specific optimizers and loss functions are mentioned, no specific version numbers for software libraries (e.g., Python, PyTorch, CUDA) are provided.
Experiment Setup	Yes	The models are optimized by Adam [27] with a learning rate of 1 10 4 for Vi T-B and 1 10 5 for Vi T-L, a weight decay of 1 10 5, and input batches of size 8, using both point and box prompts. The gating modules are trained with the same learning rate. We set the lowerand higher-rank dimensions as r L = 16 and r H = 256, respectively, and use a Gumbel-Sigmoid temperature of 0.5.