InstaNAS: Instance-Aware Neural Architecture Search
Authors: An-Chieh Cheng, Chieh Hubert Lin, Da-Cheng Juan, Wei Wei, Min Sun3577-3584
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments within a search space inspired by Mobile Net V2 show Insta NAS can achieve up to 48.8% latency reduction without compromising accuracy on a series of datasets against Mobile Net V2. |
| Researcher Affiliation | Collaboration | 1National Tsing-Hua University, Hsinchu, Taiwan 2Google Research, Mountain View, USA, 3Appier Inc., Taiwan 4MOST Joint Research Center for AI Technology and All Vista Healthcare, Taiwan |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code or explicitly state that the code is open-sourced. |
| Open Datasets | Yes | We validate Insta NAS on CIFAR-10/100 with the search space described in the previous section. ... Experiments on Tiny Image Net and Image Net. |
| Dataset Splits | No | The paper mentions using CIFAR-10/100, Tiny Image Net and Image Net, and discusses training stages, but does not provide specific percentages or counts for validation splits, or reference predefined splits. |
| Hardware Specification | No | For a fair comparison, all CPU latencies are measured in the same work station and the same framework (Py Torch v1.0.0). No specific details about the CPU model, GPU, or other hardware specifications are provided. |
| Software Dependencies | Yes | For a fair comparison, all CPU latencies are measured in the same work station and the same framework (Py Torch v1.0.0). |
| Experiment Setup | Yes | For pre-training the meta-graph, we use Stochastic Gradient Descent optimizer with initial learning rate 0.1. After the joint training ends, some controllers are picked by human preference by considering the accuracy and latency trade-off. At this point, the accuracy measured in the joint training stage can only consider as a reference value, the meta-graph needs to re-train from scratch with respect to the picked policy. We use Adam optimizer with learning rate 0.01 and decays with cosine annealing. ... we apply random copping, random horizontal flipping, and cut-out (De Vries and Taylor 2017) as data augmentation methods. |