Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping

Authors: Chunming He, Kai Li, Yachao Zhang, Guoxia Xu, Longxiang Tang, Yulun Zhang, Zhenhua Guo, Xiu Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify the effectiveness of our method on various WSCOS tasks, and experiments demonstrate that our method achieves state-of-the-art performance on these tasks.
Researcher Affiliation Collaboration Chunming He1, , Kai Li2, , Yachao Zhang1 , Guoxia Xu3 , Longxiang Tang1 , Yulun Zhang4 , Zhenhua Guo5 , and Xiu Li1, 1Shenzhen International Graduate School, Tsinghua University, 2NEC Laboratories America, 3Nanjing University of Posts and Telecommunications, 4ETH Zürich, 5Tianyi Traffic Technology
Pseudocode No Insufficient information. The paper describes procedural steps and provides a model architecture diagram (Figure 3), but it does not present structured pseudocode or an algorithm block.
Open Source Code Yes The code will be available at https://github.com/Chunming He/WS-SAM.
Open Datasets Yes Four datasets are used for experiments, i.e., CHAMELEON [44], CAMO [45], COD10K [1], and NC4K [16]... Three widely-used Polyp datasets are selected, namely CVCColon DB [46], ETIS [47], and Kvasir [48]... Two datasets, GDD [6] and GSD [29], are used for evaluation.
Dataset Splits No Insufficient information. The paper describes training and testing phases but does not explicitly provide details about a validation dataset split (e.g., percentages or counts).
Hardware Specification Yes We implement our method with Py Torch and run experiments on two RTX3090 GPUs.
Software Dependencies No Insufficient information. The paper mentions 'Py Torch' but does not specify its version or other software dependencies with their version numbers.
Experiment Setup Yes Implementation details. The image encoder uses Res Net50 as the backbone and is pre-trained on Image Net [39]. The batch size is 36 and the learning rate is initialized as 0.0001, decreased by 0.1 every 80 epochs. For scribble supervision, we propose a nine-box strategy, namely constructing the minimum outer wrapping rectangle of the foreground/background scribble and dividing it into a nine-box grid, to sample one point in each box and send them to SAM for segmentation mask generation. Following [2], all images are resized as 352 352 in both the training and testing phases. For SAM [19], we adopt the Vi T-H SAM model to generate segmentation masks.