ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

Authors: Han Cai, Ligeng Zhu, Song Han

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on CIFAR-10 and Image Net demonstrate the effectiveness of directness and specialization.
Researcher Affiliation Academia Han Cai, Ligeng Zhu, Song Han Massachusetts Institute of Technology {hancai, ligeng, songhan}@mit.edu
Pseudocode No The paper includes diagrams (e.g., Figure 2) and describes algorithms in text, but it does not present any formal pseudocode or algorithm blocks.
Open Source Code Yes Pretrained models and evaluation code are released at https://github.com/MIT-HAN-LAB/Proxyless NAS.
Open Datasets Yes We demonstrate the effectiveness of our proposed method on two benchmark datasets (CIFAR-10 and Image Net) for the image classification task.
Dataset Splits Yes We randomly sample 5,000 images from the training set as a validation set for learning architecture parameters which are updated using the Adam optimizer with an initial learning rate of 0.006 for the gradient-based algorithm (Section 3.2.1) and 0.01 for the REINFORCEbased algorithm (Section 3.3.2).
Hardware Specification Yes The GPU latency is measured on V100 GPU with a batch size of 8... The CPU latency is measured under batch size 1 on a server with two 2.40GHz Intel(R) Xeon(R) CPU E5-2640 v4. The mobile latency is measured on Google Pixel 1 phone with a batch size of 1.
Software Dependencies No The paper mentions "Tensor Flow-Lite" in Appendix B but does not provide specific version numbers for it or any other software dependencies. Therefore, a reproducible description is not provided.
Experiment Setup Yes We randomly sample 5,000 images from the training set as a validation set for learning architecture parameters which are updated using the Adam optimizer with an initial learning rate of 0.006 for the gradient-based algorithm (Section 3.2.1) and 0.01 for the REINFORCEbased algorithm (Section 3.3.2). ... After the training process of the over-parameterized network completes, a compact network is derived... Next, we train the compact network using the same training settings except that the number of training epochs increases from 200 to 300.