Feature Intertwiner for Object Detection
Authors: Hongyang Li, Bo Dai, Shaoshuai Shi, Wanli Ouyang, Xiaogang Wang
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Inter Net on the object detection track of the challenging COCO benchmark (Tsung Yi Lin, 2015). For training, we follow common practice as in (Ren et al., 2015; He et al., 2017) and use the trainval35k split (union of 80k images from train and a random 35k subset of images from 40k val split) for training. The lesion and sensitivity studies are reported by evaluating on the minival split (the remaining 5k images from val). For all experiments, we use depth 50 or 101 Res Net (He et al., 2016) with FPN (Lin et al., 2017a) constructed on top. |
| Researcher Affiliation | Collaboration | 1Department of Electronic Engineering The Chinese University of Hong Kong {yangli,ssshi,xgwang}@ee.cuhk.edu.hk 2Department of Information Engineering The Chinese University of Hong Kong bdai@ie.cuhk.edu.hk 3The University of Sydney, Sense Time Computer Vision Research Group wanli.ouyang@sydney.edu.au |
| Pseudocode | Yes | Algorithm 1 Sinkhorn divergence WQ adapted for object detection (red rectangle in Fig.2) |
| Open Source Code | Yes | Full code suite is available at https://github.com/hli2020/feature intertwiner. |
| Open Datasets | Yes | We evaluate Inter Net on the object detection track of the challenging COCO benchmark (Tsung Yi Lin, 2015). |
| Dataset Splits | Yes | For training, we follow common practice as in (Ren et al., 2015; He et al., 2017) and use the trainval35k split (union of 80k images from train and a random 35k subset of images from 40k val split) for training. The lesion and sensitivity studies are reported by evaluating on the minival split (the remaining 5k images from val). |
| Hardware Specification | Yes | The inference runs at 325ms per image (input size is 800) on a Titan Pascal X, increasing around 5% time compared to baseline (308 ms). We do not intentionally optimize the codebase, however. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x) are listed. |
| Experiment Setup | Yes | We adopt the stochastic gradient descent as optimizer. Initial learning rate is 0.01 with momentum 0.9 and weight decay 0.0001. Altogether there are 13 epoches for most models where the learning rate is dropped by 90% at epoch 6 and 10. We find the warm-up strategy (Goyal et al., 2017) barely improves the performance and hence do not adopt it. The gradient clip is introduced to prevent training loss to explode in the first few iterations, with maximum gradient norm to be 5. Batch size is set to 8 and the system is running on 8 GPUs. [...] Non-maximum suppression (NMS) is used during RPN generation and detection test phase. Threshold for RPN is set to 0.7 while the value is 0.3 during test. [...] Each level l among the five stages owns a unique anchor size: 32, 64, 128, 256, and 512. |