Task-Disruptive Background Suppression for Few-Shot Segmentation

Authors: Suho Park, SuBeen Lee, Sangeek Hyun, Hyun Seok Seong, Jae-Pil Heo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed method achieves state-of-the-art performance on standard few-shot segmentation benchmarks. Our official code is available at github.com/Suho Park0706/TBSNet. We utilize PASCAL-5i (Shaban et al. 2017b) and COCO20i(Nguyen and Todorovic 2019) following the prior works (Zhang et al. 2021; Shi et al. 2022; Peng et al. 2023; Wang, Sun, and Zhang 2023). To evaluate the model s adaptability to novel classes, we adopt a cross-validation scheme where each fold is selected as Dtest and others are used as Dtrain. Then, we evaluate the model with mean intersection over union (m Io U) and foreground-background intersection over union (FBIo U) for 1000 episodes randomly sampled from Dtest. Quantitative Results. We evaluate our proposed method by comparing it with previous techniques designed for fewshot segmentation. As illustrated in Table 1, recent affinity learning models, specifically Cy CTR and DCAMA, already exhibit comparable performances. Upon incorporating TBS into these approaches, a consistent improvement over baseline model performances is observed, resulting in the state-of-the-art scores. This improvement remains consistent across various evaluation metrics and different quantities of labeled images on the PASCAL-5i dataset. Similar trends are observed in the 1-shot scenario of COCO-20i. As demonstrated in Table 2, TBS consistently enhances DCAMA s performance across all folds, providing the best performance. While its impact is less pronounced in the 5-shot scenario compared to the 1-shot scenario, where it shows substantial effectiveness, TBS still succeeds in improving the average m Io U of DCAMA. As a result, TBS surpasses the existing state-of-the-art performance in three out of four quantitative benchmark scenarios in the context of few-shot segmentation. This verifies the effectiveness of suppressing disruptive support, particu-larly in situations of extreme data scarcity. Qualitative Results. In addition to the quantitative results, we report qualitative results to intuitively show the effectiveness of TBS. Compared with DCAMA, our results include fewer mispredicted pixels regardless of the number of support images as shown in Fig. 4. Especially, when objects in the support background are not present in the query background, our model outperforms DCAMA. This validates that our method appropriately suppresses unnecessary support background. Additional in-depth analysis of it is provided in Section 5.4. QS TS m Io U 73.8 73.4 74.1 74.4 (a) Effect of two spatialwise scores. QS and TS denote the queryand targetrelevant scores. Method Metric AA SF&QF 0.213 SB&QB 0.886 Avg. 0.550 SF&QF 0.325 SB&QB 0.820 Avg. 0.573 (b) Averaged Attention score (AA) (See Section 5.3). Table 3: Results of ablation studies 5 Further Analysis In this section, we conduct ablation studies and provide an in-depth analysis of our method. For most ablation studies, we use the PASCAL-5i dataset in the 1-shot scenario with Swin-B as the backbone network, except for Section 5.2. Additionally, m Io U is adopted for metric which is one of the most standard metrics in few-shot segmentation.
Researcher Affiliation Academia Suho Park, Su Been Lee, Sangeek Hyun, Hyun Seok Seong, Jae-Pil Heo* Sungkyunkwan University {shms0706, leesb7426, hsi1032, gustjrdl95, jaepilheo}@skku.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Figure 3 illustrates the module, but it is a diagram, not pseudocode.
Open Source Code Yes Our official code is available at github.com/Suho Park0706/TBSNet.
Open Datasets Yes We utilize PASCAL-5i (Shaban et al. 2017b) and COCO20i(Nguyen and Todorovic 2019) following the prior works (Zhang et al. 2021; Shi et al. 2022; Peng et al. 2023; Wang, Sun, and Zhang 2023). PASCAL-5i combines data from PASCAL VOC 2012 (Williams 2010) and SDS (Hariharan et al. 2014), comprising 20 categories. In contrast, COCO-20i is a subset of COCO (Lin et al. 2014) and is comprised of 80 categories.
Dataset Splits No The paper states: "Each dataset consists of distinct object classes Ctrain and Ctest without any overlap (Ctrain Ctest = ). Generally, the training and testing of few-shot segmentation are composed of several episodes. Each episode consists of K labeled images and an unlabeled image, i.e., K-shot episode... To evaluate the model s adaptability to novel classes, we adopt a cross-validation scheme where each fold is selected as Dtest and others are used as Dtrain." While it describes train/test splits within a cross-validation scheme, it does not explicitly mention or detail a separate 'validation' dataset split with specific percentages or counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions using ResNet-101 and Swin-Transformer as feature extractors.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library names like PyTorch, TensorFlow, or specific CUDA versions).
Experiment Setup Yes In the case of DCAMA with Swin-Transformer, we apply TBS at scales of 1/8, 1/16, and 1/32 to align with the scale used in DCAMA s cross-attention mechanism. However, for DCAMA with Res Net-101, we utilize TBS only in 1/16 and 1/32 scales due to memory limitation. On the other hand, since Cy CTR was verified only on Res Net, we conducted experiments on Res Net-101, not Swin-Transformer. Unlike DCAMA which adopts multi-level features, Cy CTR utilizes single-level features generated by combining features from 3and 4-th blocks. Therefore, we suppress only those combined features by using TBS.