Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection
Authors: Junran Peng, Ming Sun, ZHAO-XIANG ZHANG, Tieniu Tan, Junjie Yan
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, NATS for detection could improve the AP of Faster-RCNN based on Res Net-50 and Res Net-101 by 2.0% and 1.8% without any extra parameters or FLOPs, and keep the inference times almost the same. ... In experiments, we demonstrate the effectiveness of NATS on networks like Res Net and Res Ne Xt. Our transformed networks, combined with various detection frameworks, achieve significant improvements on the COCO dataset while keeping fast. |
| Researcher Affiliation | Collaboration | Junran Peng1,2,3 Ming Sun2 Zhaoxiang Zhang1,3 Tieniu Tan1,3 Junjie Yan2 1University of Chinese Academy of Sciences 2Sense Time Group Limited 3Center for Research on Intelligent Perception and Computing, CASIA |
| Pseudocode | No | The paper does not contain any explicit 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code for their method, nor does it include a link to a code repository. |
| Open Datasets | Yes | We use the MS-COCO [18] for experiment in this paper. |
| Dataset Splits | Yes | The training set is divided into two splits, and the optimization alternates between updating network parameters in the first split and updating architecture parameters αg i in the other split. During searching stage, we use train2014 for training model parameters and use 35K images from val2014 that are not in minival for calibrating architecture parameters. During retraining stage, our searched model is trained with train2017 and evaluated with minival as convention. |
| Hardware Specification | Yes | Our efficient search takes only 20 1080TI GPU days on object detection... the searching stage of NATS takes only 2.5 days on 8 1080TI GPUs |
| Software Dependencies | No | The paper mentions optimizers (SGD, Adam) and training mechanisms (Cosine annealing learning rate, Synchronized Batch Norm) but does not provide specific version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | Searching details. We conduct architecture transformation search for 25 epochs in total. To make the super-network converge better, architecture parameters are designed not to be updated in the first 10 epochs. The batch size is 1 image per GPU due to GPU memory constraint. We use SGD optimizer with momentum 0.9 and weight decay 0.0001 for training model weights. Cosine annealing learning rate that decays from 0.00125 to 0.00005 is applied as lr-scheduler. When training architecture parameters α, we use Adam optimizer with learning rate 0.01 and weight decay 0.00001. Training details. After the architecture searching is finished, we decode discrete architecture as mentioned in 3.3. We use SGD optimizer with 0.9 momentum and 0.0001 weight decay. For fair comparison, all our model is trained for 13 epochs, known as 1 schedule. The initial learning rate is set 0.00125 per image and is divided by 10 at 8 and 11 epochs. Warming up and Synchronized Batch Norm mechanism are applied in both baselines and our searched models for multi-GPU training. |