NAT: Neural Architecture Transformer for Accurate and Compact Architectures
Authors: Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Jian Chen, Peilin Zhao, Junzhou Huang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and Image Net, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods. 4 Experiments In this section, we apply NAT on both hand-crafted and NAS based architectures, and conduct experiments on two image classification benchmark datasets, i.e., CIFAR-10 [22] and Image Net [8]. |
| Researcher Affiliation | Collaboration | Yong Guo , Yin Zheng , Mingkui Tan , Qi Chen, Jian Chen , Peilin Zhao, Junzhou Huang South China University of Technology, Weixin Group, Tencent, Tencent AI Lab, University of Texas at Arlington |
| Pseudocode | Yes | Algorithm 1 Training method for Neural Architecture Transformer (NAT). |
| Open Source Code | Yes | The source code of NAT is available at https://github.com/guoyongcs/NAT. |
| Open Datasets | Yes | Extensive experiments on two benchmark datasets, i.e., CIFAR-10 [22] and Image Net [8] demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods. |
| Dataset Splits | Yes | We split CIFAR-10 training set into 40% and 60% slices to train the model parameters w and the transformer parameters θ, respectively. |
| Hardware Specification | No | The paper does not specify the hardware used to run the experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper states 'All implementations are based on Py Torch.' but does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | During training, we build the deep network by stacking 8 basic cells and train the transformer for 100 epochs. We set m = 1, n = 1, and λ = 0.003 in the training. We split CIFAR-10 training set into 40% and 60% slices to train the model parameters w and the transformer parameters θ, respectively. For all the considered architectures, we follow the same settings of the original papers. |