SNIPER: Efficient Multi-Scale Training
Authors: Bharat Singh, Mahyar Najibi, Larry S. Davis
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present SNIPER, an algorithm for performing efficient multi-scale training in instance level visual recognition tasks. Our implementation based on Faster-RCNN with a Res Net-101 backbone obtains an m AP of 47.6% on the COCO dataset for bounding box detection and can process 5 images per second during inference with a single GPU. |
| Researcher Affiliation | Academia | Bharat Singh Mahyar Najibi Larry S. Davis University of Maryland, College Park {bharat,najibi,lsd}@cs.umd.edu |
| Pseudocode | No | The paper describes the SNIPER algorithm in detail using prose, but it does not include any formal pseudocode or algorithm blocks (e.g., labeled 'Algorithm 1'). |
| Open Source Code | Yes | Code is available at https://github.com/mahyarnajibi/SNIPER/. |
| Open Datasets | Yes | We evaluate SNIPER on the COCO dataset for object detection. COCO contains 123,000 images in the training and validation set and 20,288 images in the test-dev set. ... In very large datasets like Open Images V4 [18] containing 1.7 million images, most objects are large and images provided are high resolution (1024x768), so it is less important to upsample images by 3 . In this case, with SNIPER, we generate 3.5 million chips of size 512x512 using scales of (512/ms, 1). ... I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, S. Kamali, M. Malloci, J. Pont-Tuset, A. Veit, S. Belongie, V. Gomes, A. Gupta, C. Sun, G. Chechik, D. Cai, Z. Feng, D. Narayanan, and K. Murphy. Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://storage.googleapis.com/openimages/web/index.html, 2017. |
| Dataset Splits | Yes | COCO contains 123,000 images in the training and validation set and 20,288 images in the test-dev set. We train on the combined training and validation set and report results on the test-dev set. Since recall for proposals is not provided by the evaluation server, we train on 118,000 images and report recall on the remaining 5,000 images (commonly referred to as the minival set). |
| Hardware Specification | Yes | It takes 14 hours to train SNIPER end to end on a 8 GPU V100 node with a Faster-RCNN detector which has a Res Net-101 backbone. ... Not only is SNIPER efficient in training, it can also process around 5 images per second on a single V100 GPU. |
| Software Dependencies | No | The paper mentions 'mixed precision training as described in [27]' but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or other dependencies. |
| Experiment Setup | Yes | On COCO, we train SNIPER with a batch-size of 128 and with a learning rate of 0.015. We use a chip size of 512 512 pixels. Training scales are set to (512/ms, 1.667, 3) where ms is the maximum value width and height of the image. The desired area ranges (i.e. Ri) are set to (0,802), (322, 1502), and (1202, inf) for each of the scales respectively. Training is performed for a total of 6 epochs with step-down at the end of epoch 5. Image flipping is used as a data-augmentation technique. Every epoch requires 11,000 iterations. For training RPN without negatives, each epoch requires 7000 iterations. We use RPN for generating negative chips and train it for 2 epochs with a fixed learning rate of 0.015 without any step-down. |