AtomNAS: Fine-Grained End-to-End Neural Architecture Search

Authors: Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Alan Yuille, Jianchao Yang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiment, our method achieves 75.9% top-1 accuracy on Image Net dataset around 360M FLOPs, which is 0.9% higher than state-of-the-art model (Stamoulis et al., 2019a). We first describe the implementation details in Section 4.1 and then compare Atom NAS with previous state-of-the-art methods under various FLOPs constraints in Section 4.2. In Section 4.3, we provide more detailed analysis about Atom NAS. Finally, in Section 4.4, we demonstrate the transferability of Atom NAS networks by evaluating them on detection and instance segmentation tasks.
Researcher Affiliation Collaboration Jieru Mei1 , Yingwei Li1 , Xiaochen Lian2, Xiaojie Jin2, Linjie Yang2, Alan Yuille1 & Jianchao Yang2 1Johns Hopkins University 2Byte Dance AI Lab
Pseudocode Yes Algorithm 1: Dynamic network shrinkage
Open Source Code Yes We open our entire codebase at: https://github.com/meijieru/Atom NAS.
Open Datasets Yes We apply Atom NAS to search high performance light-weight model on Image Net 2012 classification task (Deng et al., 2009). on COCO dataset (Lin et al., 2014).
Dataset Splits Yes All the models are trained on COCO train2017 with batch size 16 and evaluated on COCO val2017.
Hardware Specification Yes When training the supernet, we use a total batch size of 2048 on 32 Tesla V100 GPUs and train for 350 epochs.
Software Dependencies No The paper mentions using "RMSProp optimizer" and "MMDetection" but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We use the same training configuration (e.g., RMSProp optimizer, EMA on weights and exponential learning rate decay) as Tan et al. (2019); Stamoulis et al. (2019a)...When training the supernet, we use a total batch size of 2048 on 32 Tesla V100 GPUs and train for 350 epochs. For our dynamic network shrinkage algorithm, we set the momentum factor β in Eq. (7) to 0.9999...By setting the weight of the L1 penalty term λ to be 1.8 10 4, 1.2 10 4 and 1.0 10 4 respectively