CATrans: Context and Affinity Transformer for Few-Shot Segmentation
Authors: Shan Zhang, Tianyi Wu, Sitong Wu, Guodong Guo
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to demonstrate the effectiveness of the proposed model, outperforming the state-of-the-art methods. |
| Researcher Affiliation | Collaboration | 1Australian National University, Canberra, Australia 2Institute of Deep Learning, Baidu Research, Beijing, China |
| Pseudocode | No | The paper includes mathematical equations for attention layers and affinities but does not provide pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | PASCAL-5i [Shaban et al., 2017] is composed of PASCAL VOC 2012 with additional SBD [Hariharan et al., 2011] annotations, which contains 20 categories split into 4 folds (15/5 categories as base/novel classes). COCO-20i [Lin et al., 2014] is created from MS COCO where the 80 object categories are divided into four splits (60/20 categories as base/novel classes). |
| Dataset Splits | Yes | Specifically, all classes are divided into two disjointed class set Ctrain and Ctest. To mitigate the overfitting caused by insufficient training data, we follow the common protocol called episodic training. Under K-shot setting, each episode is composed of a support set S = {(Is, Ms)}K, where Is, Ms are support image and its corresponding mask, and a query sample Q = (Iq, Mq), where Iq, Mq are the query image and mask, respectively. In particular, given dataset Dtrain = {S, Q}Ntrain and Dtest = {S, Q}Ntest with category set Ctrain and Ctest, respectively, where Ntrain and Ntest is the number of episodes for training and test sets. During evaluation, the results are averaged on the randomly sampled 5k and 20k episodes for each fold and 5 runs with different seeds. |
| Hardware Specification | Yes | We conduct all experiments on 1 NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions "Adam W as the optimizer" and uses ResNet and Swin-Transformer as backbones, but does not specify version numbers for any software, libraries, or frameworks (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | The models are trained for 20k and 40k iterations on PASCAL-5i and COCO-20i, respectively, with Adam W as the optimizer. The initial learning rate is set to 5e-5 and decays at 10k iteration with a factor of 0.1. During training, we first resize the input images to 384 384 and 512 512 for PASCAL-5i and COCO-20i, respectively, and then perform the horizontal flip operation randomly. We simply use cross-entropy loss with a weight of 1 and 4 for background and foreground pixels, respectively. The BN layers of image encoder are frozen. For a fair comparision, we employ the widely-used Res Net-50, Res Net-101 and Swin-Transformer as the image encoder. |