NASPY: Automated Extraction of Automated Machine Learning Models
Authors: Xiaoxuan Lou, Shangwei Guo, Jiwei Li, Yaoxin Wu, Tianwei Zhang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments to demonstrate the effectiveness of NASPY. Our identification model can predict the operation sequences of different NAS methods (DARTS (Liu et al., 2018), GDAS (Dong & Yang, 2019) and TE-NAS (Chen et al., 2021)) with an error rate of 3.2%. Our hyper-parameter prediction can achieve more than 98% accuracy. The framework also demonstrates high robustness against random noise introduced by the complex and dynamic hardware systems. |
| Researcher Affiliation | Collaboration | 1Nanyang Technological University, 2Chongqing University, 3Zhejiang University, 4Shannon.AI |
| Pseudocode | Yes | Algorithm 1: GEMM in Open BLAS |
| Open Source Code | Yes | The source code of NASPY is available at https://github.com/Lou Xiaoxuan/NASPY. |
| Open Datasets | Yes | Dataset construction. We search model architectures with CIFAR10, and train model parameters over CIFAR10 and CIFAR100. |
| Dataset Splits | Yes | We randomly select 80% of the sequences as the training set, and the rest as the validation set. |
| Hardware Specification | Yes | The model is trained for 100 epochs, which takes 6.25 hours on one V100 GPU. |
| Software Dependencies | Yes | Without loss of generality, we adopt Pytorch (1.8.0) and Open BLAS (0.3.13). |
| Experiment Setup | Yes | CRNN+CTC model. This model is comprised sequentially with one convolution layer l C, one bidirectional GRU layer l R and one classifier F with two FC layers. To evaluate the capability of l C on the feature learning, both 1d and 2d convolutions are adopt in experiments for comparison. Besides, to evaluate the performance of identifiers with different model sizes, three candidate dimensions of l R (i.e., 128, 256, 512) are considered. To train the model, we use CTC loss as the criterion to bypass the sequence alignment, and we use Adam optimization. The learning rate starts from 5e-4 and is scheduled following the One Cycle LR policy (Smith & Topin, 2019). The model is trained for 100 epochs, which takes 6.25 hours on one V100 GPU. |