Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping
Authors: Chunming He, Kai Li, Yachao Zhang, Guoxia Xu, Longxiang Tang, Yulun Zhang, Zhenhua Guo, Xiu Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify the effectiveness of our method on various WSCOS tasks, and experiments demonstrate that our method achieves state-of-the-art performance on these tasks. |
| Researcher Affiliation | Collaboration | Chunming He1, , Kai Li2, , Yachao Zhang1 , Guoxia Xu3 , Longxiang Tang1 , Yulun Zhang4 , Zhenhua Guo5 , and Xiu Li1, 1Shenzhen International Graduate School, Tsinghua University, 2NEC Laboratories America, 3Nanjing University of Posts and Telecommunications, 4ETH Zürich, 5Tianyi Traffic Technology |
| Pseudocode | No | Insufficient information. The paper describes procedural steps and provides a model architecture diagram (Figure 3), but it does not present structured pseudocode or an algorithm block. |
| Open Source Code | Yes | The code will be available at https://github.com/Chunming He/WS-SAM. |
| Open Datasets | Yes | Four datasets are used for experiments, i.e., CHAMELEON [44], CAMO [45], COD10K [1], and NC4K [16]... Three widely-used Polyp datasets are selected, namely CVCColon DB [46], ETIS [47], and Kvasir [48]... Two datasets, GDD [6] and GSD [29], are used for evaluation. |
| Dataset Splits | No | Insufficient information. The paper describes training and testing phases but does not explicitly provide details about a validation dataset split (e.g., percentages or counts). |
| Hardware Specification | Yes | We implement our method with Py Torch and run experiments on two RTX3090 GPUs. |
| Software Dependencies | No | Insufficient information. The paper mentions 'Py Torch' but does not specify its version or other software dependencies with their version numbers. |
| Experiment Setup | Yes | Implementation details. The image encoder uses Res Net50 as the backbone and is pre-trained on Image Net [39]. The batch size is 36 and the learning rate is initialized as 0.0001, decreased by 0.1 every 80 epochs. For scribble supervision, we propose a nine-box strategy, namely constructing the minimum outer wrapping rectangle of the foreground/background scribble and dividing it into a nine-box grid, to sample one point in each box and send them to SAM for segmentation mask generation. Following [2], all images are resized as 352 352 in both the training and testing phases. For SAM [19], we adopt the Vi T-H SAM model to generate segmentation masks. |