Segment Anything in High Quality

Authors: Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan liu, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate the effectiveness of HQ-SAM, we perform extensive quantitative and qualitative experimental analysis. We compare HQ-SAM with SAM on a suite of 10 diverse segmentation datasets across different downstream tasks, where 8 out of them are under a zero-shot transfer protocol, including COCO [31], UVO [42], SGin W [58], LVIS [14], HQ-YTVIS [20], BIG [6], COIFT [29] and HR-SOD [51].
Researcher Affiliation Academia 1ETH Zürich 2HKUST 3Dartmouth College
Pseudocode No The paper describes the model architecture and process with text and diagrams, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and pretrained models are at https://github.com/Sys CV/SAM-HQ.
Open Datasets Yes To train HQ-SAM in a data-efficient manner, instead of further training on SA-1B [21], we compose a new training dataset HQSeg-44K which contains 44,320 extremely accurate image mask annotations. ... HQSeg-44K leverages a collection of six existing image datasets including DIS [35] (train set), Thin Object-5K [29] (train set), FSS-1000 [26], ECSSD [38], MSRA10K [8], DUT-OMRON [46] with extremely fine-grained mask labeling...
Dataset Splits Yes For ablation experiments, we use the four aforementioned extremely accurate segmentation datasets, namely, DIS (val) [35], Thin Object-5K (test) [29], COIFT [29] and HR-SOD [51] as well as the COCO validation set.
Hardware Specification Yes Thanks to the smaller-scale dataset and our minimal integrated architecture, HQ-SAM can be trained in only 4 hours on 8 RTX 3090 GPUs.
Software Dependencies No The paper does not specify version numbers for any software dependencies such as programming languages, libraries, or frameworks used in the experiments.
Experiment Setup Yes We use a learning rate of 0.001 and train our HQ-SAM for 12 epochs, with a learning rate drop after 10 epochs. We train on 8 Nvidia Ge Force RTX 3090 GPUs with a total batch size of 32, which takes 4 hours to train for 16.6K iterations.