RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder

Authors: Cheng Chi, Fangyun Wei, Han Hu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments are all implemented on the MMDetection v1.1.0 codebase [2]. All experiments are performed on MS COCO dataset[19]. A union of 80k train images and a 35k subset of val images are used for training. Most ablation experiments are studied on a subset of 5k unused val images (denoted as minival).
Researcher Affiliation Collaboration Cheng Chi Institute of Automation, CAS chicheng15@mails.ucas.ac.cn; Fangyun Wei Microsoft Research Asia fawe@microsoft.com; Han Hu Microsoft Research Asia hanhu@microsoft.com; The work is done when Cheng Chi is an intern at Microsoft Research Asia.
Pseudocode No The paper describes the steps of its proposed module but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The code is available at https://github.com/microsoft/Relation Net2.
Open Datasets Yes Our experiments are all implemented on the MMDetection v1.1.0 codebase [2]. All experiments are performed on MS COCO dataset[19].
Dataset Splits Yes A union of 80k train images and a 35k subset of val images are used for training. Most ablation experiments are studied on a subset of 5k unused val images (denoted as minival).
Hardware Specification Yes The real inference speed of different models using a V100 GPU (fp32 mode is used) are shown in Table 11.
Software Dependencies Yes Our experiments are all implemented on the MMDetection v1.1.0 codebase [2].
Experiment Setup Yes Unless otherwise stated, all the training and inference details keep the same as the default settings in MMDetection, i.e., initializing the backbone using the Image Net [25] pretrained model, resizing the input images to keep their shorter side being 800 and their longer side less than or equal to 1333, optimizing the whole network via the SGD algorithm with 0.9 momentum, 0.0001 weight decay, setting the initial learning rate as 0.02 with the 0.1 decrease at epoch 8 and 11. In the large model experiments in Table 10 and 12, we train 20 epochs and decrease the learning rate at epoch 16 and 19. Multi-scale training is also adopted in large model experiments, for each mini-batch, the shorter side is randomly selected from a range of [400, 1200].