NASPY: Automated Extraction of Automated Machine Learning Models

Authors: Xiaoxuan Lou, Shangwei Guo, Jiwei Li, Yaoxin Wu, Tianwei Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments to demonstrate the effectiveness of NASPY. Our identification model can predict the operation sequences of different NAS methods (DARTS (Liu et al., 2018), GDAS (Dong & Yang, 2019) and TE-NAS (Chen et al., 2021)) with an error rate of 3.2%. Our hyper-parameter prediction can achieve more than 98% accuracy. The framework also demonstrates high robustness against random noise introduced by the complex and dynamic hardware systems.
Researcher Affiliation Collaboration 1Nanyang Technological University, 2Chongqing University, 3Zhejiang University, 4Shannon.AI
Pseudocode Yes Algorithm 1: GEMM in Open BLAS
Open Source Code Yes The source code of NASPY is available at https://github.com/Lou Xiaoxuan/NASPY.
Open Datasets Yes Dataset construction. We search model architectures with CIFAR10, and train model parameters over CIFAR10 and CIFAR100.
Dataset Splits Yes We randomly select 80% of the sequences as the training set, and the rest as the validation set.
Hardware Specification Yes The model is trained for 100 epochs, which takes 6.25 hours on one V100 GPU.
Software Dependencies Yes Without loss of generality, we adopt Pytorch (1.8.0) and Open BLAS (0.3.13).
Experiment Setup Yes CRNN+CTC model. This model is comprised sequentially with one convolution layer l C, one bidirectional GRU layer l R and one classifier F with two FC layers. To evaluate the capability of l C on the feature learning, both 1d and 2d convolutions are adopt in experiments for comparison. Besides, to evaluate the performance of identifiers with different model sizes, three candidate dimensions of l R (i.e., 128, 256, 512) are considered. To train the model, we use CTC loss as the criterion to bypass the sequence alignment, and we use Adam optimization. The learning rate starts from 5e-4 and is scheduled following the One Cycle LR policy (Smith & Topin, 2019). The model is trained for 100 epochs, which takes 6.25 hours on one V100 GPU.