TextNAS: A Neural Architecture Search Space Tailored for Text Representation

Authors: Yujing Wang, Yaming Yang, Yiren Chen, Jing Bai, Ce Zhang, Guinan Su, Xiaoyu Kou, Yunhai Tong, Mao Yang, Lidong Zhou9242-9249

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We ran experiments on the Stanford Sentiment Treebank (SST) dataset (Socher et al. 2013) to evaluate the Text NAS pipeline. The experimental results showed that the automatically generated neural architectures achieved superior performances compared to manually designed networks.
Researcher Affiliation Collaboration 1Microsoft Research Asia 2Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 3ETH Z urich, 4University of Science and Technology of China {yujwang, yayaming, jbai, maoyang, lidongz}@microsoft.com {yrchen92, kouxiaoyu, yhtong}@pku.edu.cn ce.zhang@inf.ethz.ch, sa517299@mail.ustc.edu.cn
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes 1The open source code is available at: https://github.com/yujwang/Text NAS
Open Datasets Yes We ran experiments on the Stanford Sentiment Treebank (SST) dataset (Socher et al. 2013) and We follow the pre-defined train/validation/test split of the original datasets3. 3https://nlp.stanford.edu/sentiment/code.html
Dataset Splits Yes We follow the pre-defined train/validation/test split of the original datasets3. 3https://nlp.stanford.edu/sentiment/code.html and Table 2: Statistics of text classification datasets...SST 5 8,544 1,101 2,210
Hardware Specification Yes The whole process can be finished within 24 hours on a single Tesla P100 GPU.
Software Dependencies No The paper mentions using ENAS, Adam optimizer, and stochastic gradient descent but does not specify version numbers for these or other software libraries/frameworks.
Experiment Setup Yes We set the batch size as 128, max input length as 64, hidden unit dimension for each layer as 32, dropout ratio as 0.5 and L2 regularization as 2 10 6. We utilize Adam optimizer and learning rate decay with cosine annealing: λ = λmin + 0.5 (λmax λmin)(1 + cos(πTcur/T)) (1) where λmax and λmin define the range of the learning rate, Tcur is the current epoch number and T is the cosine cycle. In our experiments, we set λmax = 0.005, λmin = 0.0001 and T = 10. After each epoch, ten candidate architectures are generated by the controller and evaluated on a batch of randomly selected validation samples. After training for 150 epochs, the architecture with the highest evaluation accuracy is chosen as the text representation network.