Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Uncertainty-aware Fine-tuning of Segmentation Foundation Models
Authors: Kangning Liu, Brian Price, Jason Kuen, Yifei Fan, Zijun Wei, Luis Figueroa, Krzysztof Geras, Carlos Fernandez-Granda
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the proposed Segmentation with Uncertainty Model (SUM) on a diverse test set consisting of 14 public benchmarks, where it achieves state-of-the-art results. |
| Researcher Affiliation | Collaboration | Kangning Liu1,2 Brian Price2 Jason Kuen2 Yifei Fan2 Zijun Wei2 Luis Figueroa2 Krzysztof J. Geras 1 Carlos Fernandez-Granda1 1New York University 2Adobe |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Kangningthu/SUM |
| Open Datasets | Yes | Training Sets We utilize multiple training sets built using the following datasets: SA-250K: An unlabeled subset extracted from SA-1B dataset [1] containing 250,000 images. HQSeg-44K [4]: A human-annotated set of 44,320 images with high-quality salient-object masks. Entity Seg Training Set [63]: A human-annotated set of 31,913 images, each with an average of 20 entity masks, detailing both foreground and background. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with specific percentages or counts for its own experiments. |
| Hardware Specification | Yes | The original SAM [1] model was trained using 256 A100 GPUs. For fine-tuning, we default to 40 Nvidia A100 80GB GPUs for fine-tuning, contingent on the availability of these GPUs. Regarding the SUM model deployed in the HQ-SAM framework [4], we adhere to the HQ-SAM configuration, utilizing 8 GPUs to ensure a fair comparison. |
| Software Dependencies | No | The paper mentions software components and methods like Adam W [74], Focal loss [61], and Dice loss [62], but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The learning rate is set at 1e-5. Our fine-tuning process takes 6,000 iterations, which is equivalent to a single SA-250K epoch, with a batch size of 40 images. To account for GPU memory constraints, we train with a single image and up to 30 randomly sampled masks per GPU. Consistent with the SAM framework [1], we apply a layer-wise learning rate decay of 0.8. |