CATrans: Context and Affinity Transformer for Few-Shot Segmentation

Authors: Shan Zhang, Tianyi Wu, Sitong Wu, Guodong Guo

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to demonstrate the effectiveness of the proposed model, outperforming the state-of-the-art methods.
Researcher Affiliation Collaboration 1Australian National University, Canberra, Australia 2Institute of Deep Learning, Baidu Research, Beijing, China
Pseudocode No The paper includes mathematical equations for attention layers and affinities but does not provide pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes PASCAL-5i [Shaban et al., 2017] is composed of PASCAL VOC 2012 with additional SBD [Hariharan et al., 2011] annotations, which contains 20 categories split into 4 folds (15/5 categories as base/novel classes). COCO-20i [Lin et al., 2014] is created from MS COCO where the 80 object categories are divided into four splits (60/20 categories as base/novel classes).
Dataset Splits Yes Specifically, all classes are divided into two disjointed class set Ctrain and Ctest. To mitigate the overfitting caused by insufficient training data, we follow the common protocol called episodic training. Under K-shot setting, each episode is composed of a support set S = {(Is, Ms)}K, where Is, Ms are support image and its corresponding mask, and a query sample Q = (Iq, Mq), where Iq, Mq are the query image and mask, respectively. In particular, given dataset Dtrain = {S, Q}Ntrain and Dtest = {S, Q}Ntest with category set Ctrain and Ctest, respectively, where Ntrain and Ntest is the number of episodes for training and test sets. During evaluation, the results are averaged on the randomly sampled 5k and 20k episodes for each fold and 5 runs with different seeds.
Hardware Specification Yes We conduct all experiments on 1 NVIDIA V100 GPU.
Software Dependencies No The paper mentions "Adam W as the optimizer" and uses ResNet and Swin-Transformer as backbones, but does not specify version numbers for any software, libraries, or frameworks (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes The models are trained for 20k and 40k iterations on PASCAL-5i and COCO-20i, respectively, with Adam W as the optimizer. The initial learning rate is set to 5e-5 and decays at 10k iteration with a factor of 0.1. During training, we first resize the input images to 384 384 and 512 512 for PASCAL-5i and COCO-20i, respectively, and then perform the horizontal flip operation randomly. We simply use cross-entropy loss with a weight of 1 and 4 for background and foreground pixels, respectively. The BN layers of image encoder are frozen. For a fair comparision, we employ the widely-used Res Net-50, Res Net-101 and Swin-Transformer as the image encoder.