ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Authors: Han Cai, Ligeng Zhu, Song Han
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on CIFAR-10 and Image Net demonstrate the effectiveness of directness and specialization. |
| Researcher Affiliation | Academia | Han Cai, Ligeng Zhu, Song Han Massachusetts Institute of Technology {hancai, ligeng, songhan}@mit.edu |
| Pseudocode | No | The paper includes diagrams (e.g., Figure 2) and describes algorithms in text, but it does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Pretrained models and evaluation code are released at https://github.com/MIT-HAN-LAB/Proxyless NAS. |
| Open Datasets | Yes | We demonstrate the effectiveness of our proposed method on two benchmark datasets (CIFAR-10 and Image Net) for the image classification task. |
| Dataset Splits | Yes | We randomly sample 5,000 images from the training set as a validation set for learning architecture parameters which are updated using the Adam optimizer with an initial learning rate of 0.006 for the gradient-based algorithm (Section 3.2.1) and 0.01 for the REINFORCEbased algorithm (Section 3.3.2). |
| Hardware Specification | Yes | The GPU latency is measured on V100 GPU with a batch size of 8... The CPU latency is measured under batch size 1 on a server with two 2.40GHz Intel(R) Xeon(R) CPU E5-2640 v4. The mobile latency is measured on Google Pixel 1 phone with a batch size of 1. |
| Software Dependencies | No | The paper mentions "Tensor Flow-Lite" in Appendix B but does not provide specific version numbers for it or any other software dependencies. Therefore, a reproducible description is not provided. |
| Experiment Setup | Yes | We randomly sample 5,000 images from the training set as a validation set for learning architecture parameters which are updated using the Adam optimizer with an initial learning rate of 0.006 for the gradient-based algorithm (Section 3.2.1) and 0.01 for the REINFORCEbased algorithm (Section 3.3.2). ... After the training process of the over-parameterized network completes, a compact network is derived... Next, we train the compact network using the same training settings except that the number of training epochs increases from 200 to 300. |