reproducibilityindex.ai

Feature-Proxy Transformer for Few-Shot Segmentation

Authors: Jian-Wei Zhang, Yifan Sun, Yi Yang, Wei Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments and show that FPTrans achieves accuracy on par with the decoder-based FSS methods. For example, on PASCAL-5i [5] with one support sample, FPTrans achieves 68.81% m Io U, setting a new state of the art.
Researcher Affiliation	Collaboration	Jian-Wei Zhang1 , Yifan Sun2, Yi Yang3, Wei Chen1 1 State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China 2 Baidu Research 3 CCAI, College of Computer Science and Technology, Zhejiang University
Pseudocode	No	The paper describes the methods using prose and mathematical equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code is available at https://github.com/Jarvis73/FPTrans.
Open Datasets	Yes	We use two popular FSS benchmarks PASCAL-5i [41] and COCO-20i [46] for evaluation. PASCAL-5i combines PASCAL VOC 2012 [10] and SBD [15], and includes 20 classes. COCO-20i is constructed with COCO 2014 [30] and includes 80 classes.
Dataset Splits	Yes	Following prior works [46, 26], we split the dataset into four splits with each split 15 classes for training and 5 classes for testing. COCO-20i is constructed with COCO 2014 [30] and includes 80 classes. It is divided into 4 splits with each split 60 classes for training and 20 classes for testing.
Hardware Specification	Yes	Using 4 A100 GPUs, we train 60 epochs with Vi T and 30 epochs with Dei T backbone, using a batch size 4 for PASCAL-5i and 16 for COCO-20i (batch size 8 in 5-shot due to the memory limitation).
Software Dependencies	No	The paper mentions using SGD optimizer and cross-entropy losses but does not specify software dependencies with version numbers (e.g., specific deep learning frameworks like PyTorch or TensorFlow, or their versions).
Experiment Setup	Yes	All the images are resized and cropped to 480 480 and augmented following [46]. We use the SGD optimizer with a momentum of 0.9, a weight decay of 5e-5, and a constant learning rate of 1e-3. Using 4 A100 GPUs, we train 60 epochs with Vi T and 30 epochs with Dei T backbone, using a batch size 4 for PASCAL-5i and 16 for COCO-20i (batch size 8 in 5-shot due to the memory limitation). When we generate the local background prompts (and the feature-based proxies), the background of each support image is partitioned into 5 local parts, i.e., S = 5. Each prompt consists of 12 tokens, i.e., G = 12. The weight factor λ for balancing the classification loss and pairwise loss (Eqn. (9)) is set as 2e-2 for PASCAL-5i and 1e-4 for COCO-20i.