Feature-Proxy Transformer for Few-Shot Segmentation

Authors: Jian-Wei Zhang, Yifan Sun, Yi Yang, Wei Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments and show that FPTrans achieves accuracy on par with the decoder-based FSS methods. For example, on PASCAL-5i [5] with one support sample, FPTrans achieves 68.81% m Io U, setting a new state of the art.
Researcher Affiliation Collaboration Jian-Wei Zhang1 , Yifan Sun2, Yi Yang3, Wei Chen1 1 State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China 2 Baidu Research 3 CCAI, College of Computer Science and Technology, Zhejiang University
Pseudocode No The paper describes the methods using prose and mathematical equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code is available at https://github.com/Jarvis73/FPTrans.
Open Datasets Yes We use two popular FSS benchmarks PASCAL-5i [41] and COCO-20i [46] for evaluation. PASCAL-5i combines PASCAL VOC 2012 [10] and SBD [15], and includes 20 classes. COCO-20i is constructed with COCO 2014 [30] and includes 80 classes.
Dataset Splits Yes Following prior works [46, 26], we split the dataset into four splits with each split 15 classes for training and 5 classes for testing. COCO-20i is constructed with COCO 2014 [30] and includes 80 classes. It is divided into 4 splits with each split 60 classes for training and 20 classes for testing.
Hardware Specification Yes Using 4 A100 GPUs, we train 60 epochs with Vi T and 30 epochs with Dei T backbone, using a batch size 4 for PASCAL-5i and 16 for COCO-20i (batch size 8 in 5-shot due to the memory limitation).
Software Dependencies No The paper mentions using SGD optimizer and cross-entropy losses but does not specify software dependencies with version numbers (e.g., specific deep learning frameworks like PyTorch or TensorFlow, or their versions).
Experiment Setup Yes All the images are resized and cropped to 480 480 and augmented following [46]. We use the SGD optimizer with a momentum of 0.9, a weight decay of 5e-5, and a constant learning rate of 1e-3. Using 4 A100 GPUs, we train 60 epochs with Vi T and 30 epochs with Dei T backbone, using a batch size 4 for PASCAL-5i and 16 for COCO-20i (batch size 8 in 5-shot due to the memory limitation). When we generate the local background prompts (and the feature-based proxies), the background of each support image is partitioned into 5 local parts, i.e., S = 5. Each prompt consists of 12 tokens, i.e., G = 12. The weight factor λ for balancing the classification loss and pairwise loss (Eqn. (9)) is set as 2e-2 for PASCAL-5i and 1e-4 for COCO-20i.