Searching Parameterized AP Loss for Object Detection

Authors: Tao Chenxin, Zizhang Li, Xizhou Zhu, Gao Huang, Yong Liu, jifeng dai

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the COCO benchmark with three different object detectors (i.e., Retina Net, Faster R-CNN, and Deformable DETR) demonstrate that the proposed Parameterized AP Loss consistently outperforms existing handcrafted losses.
Researcher Affiliation Collaboration Chenxin Tao1 , Zizhang Li2 , Xizhou Zhu3 , Gao Huang1, Yong Liu2, Jifeng Dai3,4 1Tsinghua University, 2Zhejiang University, 3Sense Time Research, 4Shanghai Jiao Tong University
Pseudocode Yes Algorithm 1: Parameterized AP Loss Search Process
Open Source Code No Code shall be released. Our code will be released once the paper is accepted, and the dataset we use are publicly available.
Open Datasets Yes We evaluate our approach on the COCO 2017 object detection benchmark2 [23]... COCO 2017 is publicly available under the Creative Commons Attribution 4.0 License.
Dataset Splits Yes In COCO 2017 [23], there are 118k images in the train subset. As described in Section 3.3, we randomly divide the original train subset into Dtrain and Deval, which constitute 113k training images and 5k evaluation images for the proxy task, respectively. After the parameter search, we re-train the object detectors with the searched Parameterized AP Loss on the COCO 2017 train subset, and evaluate them on the val subset, which consists of 5k images.
Hardware Specification Yes All the experiments are conducted on 8 NVIDIA V100 GPUs.
Software Dependencies No No specific version numbers for software dependencies (e.g., Python, PyTorch, MMDetection version) are provided.
Experiment Setup Yes For Faster R-CNN, we use a base learning rate of 0.024 and a batch size of 64 (8 images on each GPU). For Retina Net, the base learning rate is set to 0.016 and the batch size is 64 (8 images on each GPU). For Deformable DETR, Image Net [7] pre-trained Res Net-50 [12] is utilized as the backbone, and the training settings strictly follow [42], where the learning rate is set to 0.0002 and the batch size is 32 (4 images on each GPU). The default value of number of segments M is fixed as 5. In the outer-level PPO2 [39] updating of Algorithm 1, we sample S = 8 samples each round, and search for T = 40 rounds in total. The mean vector µ1 of the truncated normal distribution is initialized to make f(x; θ) = x. The standard deviation σ is initialized as 0.2, which decays linearly to 0 with respect to the search round. The clip operation in Eq. (11) is applied on each component value in Θ independently. The clip range ϵ is set to 0.1 following PPO2 [39].