NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Authors: Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Jian Chen, Peilin Zhao, Junzhou Huang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and Image Net, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods. 4 Experiments In this section, we apply NAT on both hand-crafted and NAS based architectures, and conduct experiments on two image classification benchmark datasets, i.e., CIFAR-10 [22] and Image Net [8].
Researcher Affiliation Collaboration Yong Guo , Yin Zheng , Mingkui Tan , Qi Chen, Jian Chen , Peilin Zhao, Junzhou Huang South China University of Technology, Weixin Group, Tencent, Tencent AI Lab, University of Texas at Arlington
Pseudocode Yes Algorithm 1 Training method for Neural Architecture Transformer (NAT).
Open Source Code Yes The source code of NAT is available at https://github.com/guoyongcs/NAT.
Open Datasets Yes Extensive experiments on two benchmark datasets, i.e., CIFAR-10 [22] and Image Net [8] demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods.
Dataset Splits Yes We split CIFAR-10 training set into 40% and 60% slices to train the model parameters w and the transformer parameters θ, respectively.
Hardware Specification No The paper does not specify the hardware used to run the experiments, such as specific GPU or CPU models.
Software Dependencies No The paper states 'All implementations are based on Py Torch.' but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup Yes During training, we build the deep network by stacking 8 basic cells and train the transformer for 100 epochs. We set m = 1, n = 1, and λ = 0.003 in the training. We split CIFAR-10 training set into 40% and 60% slices to train the model parameters w and the transformer parameters θ, respectively. For all the considered architectures, we follow the same settings of the original papers.