AtomNAS: Fine-Grained End-to-End Neural Architecture Search
Authors: Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Alan Yuille, Jianchao Yang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiment, our method achieves 75.9% top-1 accuracy on Image Net dataset around 360M FLOPs, which is 0.9% higher than state-of-the-art model (Stamoulis et al., 2019a). We first describe the implementation details in Section 4.1 and then compare Atom NAS with previous state-of-the-art methods under various FLOPs constraints in Section 4.2. In Section 4.3, we provide more detailed analysis about Atom NAS. Finally, in Section 4.4, we demonstrate the transferability of Atom NAS networks by evaluating them on detection and instance segmentation tasks. |
| Researcher Affiliation | Collaboration | Jieru Mei1 , Yingwei Li1 , Xiaochen Lian2, Xiaojie Jin2, Linjie Yang2, Alan Yuille1 & Jianchao Yang2 1Johns Hopkins University 2Byte Dance AI Lab |
| Pseudocode | Yes | Algorithm 1: Dynamic network shrinkage |
| Open Source Code | Yes | We open our entire codebase at: https://github.com/meijieru/Atom NAS. |
| Open Datasets | Yes | We apply Atom NAS to search high performance light-weight model on Image Net 2012 classification task (Deng et al., 2009). on COCO dataset (Lin et al., 2014). |
| Dataset Splits | Yes | All the models are trained on COCO train2017 with batch size 16 and evaluated on COCO val2017. |
| Hardware Specification | Yes | When training the supernet, we use a total batch size of 2048 on 32 Tesla V100 GPUs and train for 350 epochs. |
| Software Dependencies | No | The paper mentions using "RMSProp optimizer" and "MMDetection" but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We use the same training configuration (e.g., RMSProp optimizer, EMA on weights and exponential learning rate decay) as Tan et al. (2019); Stamoulis et al. (2019a)...When training the supernet, we use a total batch size of 2048 on 32 Tesla V100 GPUs and train for 350 epochs. For our dynamic network shrinkage algorithm, we set the momentum factor β in Eq. (7) to 0.9999...By setting the weight of the L1 penalty term λ to be 1.8 10 4, 1.2 10 4 and 1.0 10 4 respectively |