reproducibilityindex.ai

TextNAS: A Neural Architecture Search Space Tailored for Text Representation

Authors: Yujing Wang, Yaming Yang, Yiren Chen, Jing Bai, Ce Zhang, Guinan Su, Xiaoyu Kou, Yunhai Tong, Mao Yang, Lidong Zhou9242-9249

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ran experiments on the Stanford Sentiment Treebank (SST) dataset (Socher et al. 2013) to evaluate the Text NAS pipeline. The experimental results showed that the automatically generated neural architectures achieved superior performances compared to manually designed networks.
Researcher Affiliation	Collaboration	1Microsoft Research Asia 2Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 3ETH Z urich, 4University of Science and Technology of China {yujwang, yayaming, jbai, maoyang, lidongz}@microsoft.com {yrchen92, kouxiaoyu, yhtong}@pku.edu.cn ce.zhang@inf.ethz.ch, sa517299@mail.ustc.edu.cn
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	1The open source code is available at: https://github.com/yujwang/Text NAS
Open Datasets	Yes	We ran experiments on the Stanford Sentiment Treebank (SST) dataset (Socher et al. 2013) and We follow the pre-deﬁned train/validation/test split of the original datasets3. 3https://nlp.stanford.edu/sentiment/code.html
Dataset Splits	Yes	We follow the pre-deﬁned train/validation/test split of the original datasets3. 3https://nlp.stanford.edu/sentiment/code.html and Table 2: Statistics of text classiﬁcation datasets...SST 5 8,544 1,101 2,210
Hardware Specification	Yes	The whole process can be ﬁnished within 24 hours on a single Tesla P100 GPU.
Software Dependencies	No	The paper mentions using ENAS, Adam optimizer, and stochastic gradient descent but does not specify version numbers for these or other software libraries/frameworks.
Experiment Setup	Yes	We set the batch size as 128, max input length as 64, hidden unit dimension for each layer as 32, dropout ratio as 0.5 and L2 regularization as 2 10 6. We utilize Adam optimizer and learning rate decay with cosine annealing: λ = λmin + 0.5 (λmax λmin)(1 + cos(πTcur/T)) (1) where λmax and λmin deﬁne the range of the learning rate, Tcur is the current epoch number and T is the cosine cycle. In our experiments, we set λmax = 0.005, λmin = 0.0001 and T = 10. After each epoch, ten candidate architectures are generated by the controller and evaluated on a batch of randomly selected validation samples. After training for 150 epochs, the architecture with the highest evaluation accuracy is chosen as the text representation network.