Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
Authors: Hang Xu, Linpu Fang, Xiaodan Liang, Wenxiong Kang, Zhenguo Li12492-12499
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that the proposed method significantly outperforms multiple-branch models and achieves the state-of-the-art results on multiple object detection benchmarks (m AP: 49.1% on COCO). |
| Researcher Affiliation | Collaboration | Hang Xu,1 Linpu Fang,2 Xiaodan Liang,3 Wenxiong Kang,2 Zhenguo Li1 1Huawei Noah s Ark Lab 2South China University of Technology 3Sun Yat-Sen University |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information for open-source code, such as a repository link or an explicit statement of code release. |
| Open Datasets | Yes | We evaluate the performance of our Universal-RCNN on three object detection domains with different annotations of categories: MSCOCO 2017 (Lin et al. 2014), Visual Genome(VG) (Krishna et al. 2016), and ADE (Zhou et al. 2017). |
| Dataset Splits | Yes | MSCOCO is a common object detection dataset with 80 object classes, which contains 118K training images, 5K validation images (denoted as minival) and 20K unannotated testing images (denoted as test-dev) as common practice. For VG, we use ... 88K images for training and 5K images for testing... For ADE, we consider 445 classes and use 20K images for training and 1K images for testing... |
| Hardware Specification | Yes | All experiments are conducted on a single server with 8 Tesla V100 GPUs by using the Pytorch framework. |
| Software Dependencies | No | The paper mentions using the 'Pytorch framework' but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | Yes | The hyper-parameters in training mostly follow Lin et al.. During both training and testing, we resize the input image such that the shorter side has 800 pixels. ... The total number of proposed regions after NMS is Nr = 512. ... In the graph learner module, we use a linear transformation layer of size 256 ... For the spatial-aware GCN, we use two weighted graph convolutional layers with dimensions of 256 and 128 respectively... Each GCN consists of K = 8 spatial weight terms... For training, SGD with weight decay of 0.0001 and momentum of 0.9 is adopted to optimize all models. The batch size is set to be 16 with 2 images on each GPU. The initial learning rate is 0.02, reduce twice (x0.1) during the training process. We train 12 epochs for all models in an end-to-end manner. |