Uncertainty-aware Fine-tuning of Segmentation Foundation Models
Authors: Kangning Liu, Brian Price, Jason Kuen, Yifei Fan, Zijun Wei, Luis Figueroa, Krzysztof Geras, Carlos Fernandez-Granda
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the proposed Segmentation with Uncertainty Model (SUM) on a diverse test set consisting of 14 public benchmarks, where it achieves state-of-the-art results. |
| Researcher Affiliation | Collaboration | Kangning Liu1,2 Brian Price2 Jason Kuen2 Yifei Fan2 Zijun Wei2 Luis Figueroa2 Krzysztof J. Geras 1 Carlos Fernandez-Granda1 1New York University 2Adobe |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Kangningthu/SUM |
| Open Datasets | Yes | Training Sets We utilize multiple training sets built using the following datasets: SA-250K: An unlabeled subset extracted from SA-1B dataset [1] containing 250,000 images. HQSeg-44K [4]: A human-annotated set of 44,320 images with high-quality salient-object masks. Entity Seg Training Set [63]: A human-annotated set of 31,913 images, each with an average of 20 entity masks, detailing both foreground and background. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with specific percentages or counts for its own experiments. |
| Hardware Specification | Yes | The original SAM [1] model was trained using 256 A100 GPUs. For fine-tuning, we default to 40 Nvidia A100 80GB GPUs for fine-tuning, contingent on the availability of these GPUs. Regarding the SUM model deployed in the HQ-SAM framework [4], we adhere to the HQ-SAM configuration, utilizing 8 GPUs to ensure a fair comparison. |
| Software Dependencies | No | The paper mentions software components and methods like Adam W [74], Focal loss [61], and Dice loss [62], but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The learning rate is set at 1e-5. Our fine-tuning process takes 6,000 iterations, which is equivalent to a single SA-250K epoch, with a batch size of 40 images. To account for GPU memory constraints, we train with a single image and up to 30 randomly sampled masks per GPU. Consistent with the SAM framework [1], we apply a layer-wise learning rate decay of 0.8. |