Focus on Query: Adversarial Mining Transformer for Few-Shot Segmentation
Authors: Yuan Wang, Naisong Luo, Tianzhu Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on commonly used Pascal-5i and COCO-20i benchmarks and achieve state-of-the-art results across all settings. We evaluate our AMFormer on two widely used benchmarks COCO-20i and Pascal-5i with different backbones and the AMFormer consistently sets new state-of-the-art on different settings. We present the comparison of our method with previous FSS methods on Pascal-5i and COCO-20i datasets in Table 2 and Table 3. As shown in Table 5, a series of ablation studies are conducted on the first split of Pascal-5i with ResNet-101 backbone to analyze each component of the proposed AMFormer. |
| Researcher Affiliation | Academia | Yuan Wang1 , Naisong Luo1 , Tianzhu Zhang1 1Deep Space Exploration Laboratory/School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China. {wy2016,lns6}@mail.ustc.edu.cn, tzzhang@ustc.edu.cn |
| Pseudocode | No | The paper describes the methods and processes using textual descriptions and mathematical equations, but it does not provide any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | Code will be available at https://github.com/Wyxdm/AMNet |
| Open Datasets | Yes | We evaluate the proposed AMFormer on two commonly used few-shot segmentation benchmarks, Pascal-5i [2] and COCO-20i [56]. Pascal-5i is constructed based on the PASCAL VOC 2012 dataset [57] and additional annotations from SBD [58]. COCO-20i is a larger benchmark based on MSCOCO dataset [59] |
| Dataset Splits | Yes | Episodic meta-training [51] is widely used to enhance the generalization of FSS models. Specifically, the dataset is divided into the training set Dtrain and the testing set Dtest. The category sets of Dtrain and Dtest are disjoint, i.e., Ctrain Ctest = . A series of episodes are sampled from Dtrain to train the model, each of which is composed of a support set S = {Ik s , M k s }K k=1 and a query set Q = {Iq, Mq} in the K-shot setting... Following previous works [14, 55], we equally divide the 20 categories into four splits, three of which for training and the rest one for testing. (COCO-20i... the 80 categories of which are partitioned into four splits for cross-validation as down in [14].) |
| Hardware Specification | Yes | Our approach is implemented using PyTorch and all experiments are conducted on 4 NVIDIA GeForce RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions 'Our approach is implemented using PyTorch' but does not specify a version number for PyTorch or any other software dependencies with version numbers. |
| Experiment Setup | Yes | The number of attention layers in the G and D are set to 1 and 2, respectively. We increase the number of training epochs to 300 for Pascal-5i and 75 for COCO-20i, and set the batch sizes as 8 and 4, respectively. AdamW [61] optimizer with poly learning rate decay is used to train both the G and D. The initial learning rate is set to 1e-4 and the weight decay is 1e-2. In our experiment, τ is set to 0.7. λdiv denotes the weight of diversity loss and we set it to 0.1 in experiments. L is set to 3 in our experiments. The embedding dim is set to 64, and the number of head is set to 4 in all the attention layers. |