Uncertainty-aware Fine-tuning of Segmentation Foundation Models

Authors: Kangning Liu, Brian Price, Jason Kuen, Yifei Fan, Zijun Wei, Luis Figueroa, Krzysztof Geras, Carlos Fernandez-Granda

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the proposed Segmentation with Uncertainty Model (SUM) on a diverse test set consisting of 14 public benchmarks, where it achieves state-of-the-art results.
Researcher Affiliation Collaboration Kangning Liu1,2 Brian Price2 Jason Kuen2 Yifei Fan2 Zijun Wei2 Luis Figueroa2 Krzysztof J. Geras 1 Carlos Fernandez-Granda1 1New York University 2Adobe
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Kangningthu/SUM
Open Datasets Yes Training Sets We utilize multiple training sets built using the following datasets: SA-250K: An unlabeled subset extracted from SA-1B dataset [1] containing 250,000 images. HQSeg-44K [4]: A human-annotated set of 44,320 images with high-quality salient-object masks. Entity Seg Training Set [63]: A human-annotated set of 31,913 images, each with an average of 20 entity masks, detailing both foreground and background.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits with specific percentages or counts for its own experiments.
Hardware Specification Yes The original SAM [1] model was trained using 256 A100 GPUs. For fine-tuning, we default to 40 Nvidia A100 80GB GPUs for fine-tuning, contingent on the availability of these GPUs. Regarding the SUM model deployed in the HQ-SAM framework [4], we adhere to the HQ-SAM configuration, utilizing 8 GPUs to ensure a fair comparison.
Software Dependencies No The paper mentions software components and methods like Adam W [74], Focal loss [61], and Dice loss [62], but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The learning rate is set at 1e-5. Our fine-tuning process takes 6,000 iterations, which is equivalent to a single SA-250K epoch, with a batch size of 40 images. To account for GPU memory constraints, we train with a single image and up to 30 randomly sampled masks per GPU. Consistent with the SAM framework [1], we apply a layer-wise learning rate decay of 0.8.