Generalized and Discriminative Few-Shot Object Detection via SVD-Dictionary Enhancement

Authors: Aming WU, Suqi Zhao, Cheng Deng, Wei Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiments, we separately verify the effectiveness of our method on PASCAL VOC and COCO benchmarks. Particularly, for the 2-shot case in VOC split1, our method significantly outperforms the baseline by 6.2%. Moreover, visualization analysis shows that our method is instrumental in doing FSOD.
Researcher Affiliation Collaboration Aming Wu1 Suqi Zhao1 Cheng Deng1 Wei Liu2 1 School of Electronic Engineering, Xidian University, Xi an, China 2 Tencent Data Platform amwu@xidian.edu.cn, sqzhao@stu.xidian.edu.cn, chdeng@mail.xidian.edu.cn, wl2223@columbia.edu
Pseudocode No The paper describes the method using textual explanations and mathematical equations, but does not include structured pseudocode or algorithm blocks.
Open Source Code No Code will be available in https: //github.com/Aming Wu/SVD-Dictionary-Enhancement.
Open Datasets Yes In the experiments, the proposed method is evaluated on PASCAL VOC [6, 7] and COCO [20] benchmarks. We strictly follow the consistent few-shot detection data construction and evaluation protocol [17, 35, 37, 34] to ensure fair and direct comparison.
Dataset Splits Yes For PASCAL VOC, the overall 20 categories are divided into 15 base object categories and 5 new object categories. All base object category data from PASCAL VOC 07+12 trainval sets is available. For each new object category, there exist K instances available and K is set to 1, 2, 3, 5, and 10. Following existing methods [17, 34, 35], we utilize the same three random partitions of base and new object categories, referred to as New Split 1, 2, and 3. And for the predictions on PASCAL VOC 2007 test set, we separately report the results of n AP50 and n AP75. For the 80 categories in COCO, 20 categories overlapped with PASCAL VOC are taken as new object categories. The remaining 60 categories are used as base object categories. The K = 10 and 30 shots detection performance is evaluated on 5,000 images from COCO 2014 validation set.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running the experiments. The authors' self-reflection also indicates this information is not provided (point 3.d under 'If you ran experiments...').
Software Dependencies No The paper mentions using a 'standard SGD optimizer' but does not specify version numbers for any software components, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes Implementation Details. For the detection model, we use Faster R-CNN [27] with the Ro I Align [16] layer. The backbone is Res Net-101 [15]. The parameters are pre-trained on Image Net [28] for initialization. In Eq. (1), we select the first k largest singular values to compute the generalization map. Here, k is set to half of the total number of singular values. For dictionary learning, the number of codewords is set to 24. All newly introduced parameters are initialized randomly. All the experiments are trained using the standard SGD optimizer with a momentum of 0.9 and a weight decay of 0.0001. During inference, we take the output y of Eq. (3) as the classification result.