Feature-Proxy Transformer for Few-Shot Segmentation
Authors: Jian-Wei Zhang, Yifan Sun, Yi Yang, Wei Chen
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments and show that FPTrans achieves accuracy on par with the decoder-based FSS methods. For example, on PASCAL-5i [5] with one support sample, FPTrans achieves 68.81% m Io U, setting a new state of the art. |
| Researcher Affiliation | Collaboration | Jian-Wei Zhang1 , Yifan Sun2, Yi Yang3, Wei Chen1 1 State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China 2 Baidu Research 3 CCAI, College of Computer Science and Technology, Zhejiang University |
| Pseudocode | No | The paper describes the methods using prose and mathematical equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Jarvis73/FPTrans. |
| Open Datasets | Yes | We use two popular FSS benchmarks PASCAL-5i [41] and COCO-20i [46] for evaluation. PASCAL-5i combines PASCAL VOC 2012 [10] and SBD [15], and includes 20 classes. COCO-20i is constructed with COCO 2014 [30] and includes 80 classes. |
| Dataset Splits | Yes | Following prior works [46, 26], we split the dataset into four splits with each split 15 classes for training and 5 classes for testing. COCO-20i is constructed with COCO 2014 [30] and includes 80 classes. It is divided into 4 splits with each split 60 classes for training and 20 classes for testing. |
| Hardware Specification | Yes | Using 4 A100 GPUs, we train 60 epochs with Vi T and 30 epochs with Dei T backbone, using a batch size 4 for PASCAL-5i and 16 for COCO-20i (batch size 8 in 5-shot due to the memory limitation). |
| Software Dependencies | No | The paper mentions using SGD optimizer and cross-entropy losses but does not specify software dependencies with version numbers (e.g., specific deep learning frameworks like PyTorch or TensorFlow, or their versions). |
| Experiment Setup | Yes | All the images are resized and cropped to 480 480 and augmented following [46]. We use the SGD optimizer with a momentum of 0.9, a weight decay of 5e-5, and a constant learning rate of 1e-3. Using 4 A100 GPUs, we train 60 epochs with Vi T and 30 epochs with Dei T backbone, using a batch size 4 for PASCAL-5i and 16 for COCO-20i (batch size 8 in 5-shot due to the memory limitation). When we generate the local background prompts (and the feature-based proxies), the background of each support image is partitioned into 5 local parts, i.e., S = 5. Each prompt consists of 12 tokens, i.e., G = 12. The weight factor λ for balancing the classification loss and pairwise loss (Eqn. (9)) is set as 2e-2 for PASCAL-5i and 1e-4 for COCO-20i. |