Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unlocking the Power of SAM 2 for Few-Shot Segmentation

Authors: Qianxiong Xu, Lanyun Zhu, Xuanyi Liu, Guosheng Lin, Cheng Long, Ziyue Li, Rui Zhao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments have been conducted on PASCAL-5i and COCO-20i to validate the effectiveness of our design, e.g., the 1-shot m Io U can be 4.2% better than the best baseline.
Researcher Affiliation	Collaboration	1S-Lab, Nanyang Technological University 2Singapore University of Technology and Design 3Peking University 4University of Cologne 5Sense Time Research. Correspondence to: Guosheng Lin <EMAIL>, Cheng Long <EMAIL>.
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical equations (e.g., Equation 1-14) and block diagrams (Figure 2, 3, 4) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code is available at our Github repository https://github.com/Sam1224/FSSAM.
Open Datasets	Yes	We evaluate FSSAM on two benchmarks, including PASCAL-5i (Shaban et al., 2017) and COCO-20i (Nguyen & Todorovic, 2019).
Dataset Splits	Yes	Both datasets are divided into 4 disjoint folds for cross-validation, with each fold containing 5 classes for PASCAL-5i and 20 classes for COCO-20i. In each iteration, 3 folds are used for training, and the remaining fold is used for testing. Following existing works, we randomly sample 1,000 episodes for testing by default
Hardware Specification	Yes	All models are trained with 4 NVIDIA V100 (32G) GPUs, and tested with 1 V100 GPU.
Software Dependencies	No	The paper mentions using Adam W optimizer and Dice loss for training, and refers to SAM 2 and DINOv2. However, it does not specify version numbers for any libraries, frameworks (like PyTorch), or other key software components.
Experiment Setup	Yes	We deploy Adam W to optimize SAM 2 s memory encoder, memory attention and mask decoder, with other components frozen. The learning rate is initialized as 0.001, and decayed with a polynomial scheduler (Tian et al., 2020). We follow SCCAN (Xu et al., 2023) to perform data augmentation, and adopt Dice loss (Milletari et al., 2016) for training (fine-tuning). During training, the images are randomly cropped as 512 512 patches, and the testing predictions will be resized back to the original shape for metric calculation. The batch size is set as 8, with 2 samples distributed to each GPU. Following existing works (Wang et al., 2023a), the training epochs are set as 300 and 75 for PASCAL-5i and COCO-20i.