Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection

Authors: Junran Peng, Ming Sun, ZHAO-XIANG ZHANG, Tieniu Tan, Junjie Yan

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, NATS for detection could improve the AP of Faster-RCNN based on Res Net-50 and Res Net-101 by 2.0% and 1.8% without any extra parameters or FLOPs, and keep the inference times almost the same. ... In experiments, we demonstrate the effectiveness of NATS on networks like Res Net and Res Ne Xt. Our transformed networks, combined with various detection frameworks, achieve significant improvements on the COCO dataset while keeping fast.
Researcher Affiliation Collaboration Junran Peng1,2,3 Ming Sun2 Zhaoxiang Zhang1,3 Tieniu Tan1,3 Junjie Yan2 1University of Chinese Academy of Sciences 2Sense Time Group Limited 3Center for Research on Intelligent Perception and Computing, CASIA
Pseudocode No The paper does not contain any explicit 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not provide an explicit statement about releasing the source code for their method, nor does it include a link to a code repository.
Open Datasets Yes We use the MS-COCO [18] for experiment in this paper.
Dataset Splits Yes The training set is divided into two splits, and the optimization alternates between updating network parameters in the first split and updating architecture parameters αg i in the other split. During searching stage, we use train2014 for training model parameters and use 35K images from val2014 that are not in minival for calibrating architecture parameters. During retraining stage, our searched model is trained with train2017 and evaluated with minival as convention.
Hardware Specification Yes Our efficient search takes only 20 1080TI GPU days on object detection... the searching stage of NATS takes only 2.5 days on 8 1080TI GPUs
Software Dependencies No The paper mentions optimizers (SGD, Adam) and training mechanisms (Cosine annealing learning rate, Synchronized Batch Norm) but does not provide specific version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes Searching details. We conduct architecture transformation search for 25 epochs in total. To make the super-network converge better, architecture parameters are designed not to be updated in the first 10 epochs. The batch size is 1 image per GPU due to GPU memory constraint. We use SGD optimizer with momentum 0.9 and weight decay 0.0001 for training model weights. Cosine annealing learning rate that decays from 0.00125 to 0.00005 is applied as lr-scheduler. When training architecture parameters α, we use Adam optimizer with learning rate 0.01 and weight decay 0.00001. Training details. After the architecture searching is finished, we decode discrete architecture as mentioned in 3.3. We use SGD optimizer with 0.9 momentum and 0.0001 weight decay. For fair comparison, all our model is trained for 13 epochs, known as 1 schedule. The initial learning rate is set 0.00125 per image and is divided by 10 at 8 and 11 epochs. Warming up and Synchronized Batch Norm mechanism are applied in both baselines and our searched models for multi-GPU training.