reproducibilityindex.ai

On Scalar Embedding of Relative Positions in Attention Models

Authors: Junshuang Wu, Richong Zhang, Yongyi Mao, Junfan Chen14050-14057

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies show that the AT5 achieves superior performance than the T5 s SRPE. We evaluate the proposed AT5 model on some artiﬁcial tasks, text classiﬁcation, question answering, and machine translation.
Researcher Affiliation	Academia	Junshuang Wu,1,2 Richong Zhang,1,2 Yongyi Mao,3 Junfan Chen1,2 1Beijing Advanced Institution for Big Data and Brain Computing, Beihang University, Beijing, China 2SKLSDE, School of Computer Science and Engineering, Beihang University, Beijing, China 3School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada
Pseudocode	No	The paper contains diagrams and descriptions of functions, but it does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code and the Appendix material are now available at https://github.com/wujs Act/ Scalar-Embedding-of-Relative-Positions.
Open Datasets	Yes	We evaluate the proposed AT5 model on six real-world text classiﬁcation datasets, including MR, SUBJ, CR, MPQA, SST, and TREC. The word-embedding is initialized using Glove (Pennington, Socher, and Manning 2014). Given a passage and a query, the SQu AD question answering task s goal is to ﬁnd the answer in the passage. The total number of examples in the dataset is 107.7k. We utilize the WMT 2017 English-German (en2de) corpus as the training set and the newstest2014 dataset as the validation dataset in the machine translation (MT) task.
Dataset Splits	Yes	For other datasets [in text classification], the results of the 10-fold cross-validation are reported. we extract 10.1k examples as the validation set, another 10.1k examples as the test set, and the left 87.5k as the training dataset. We utilize the WMT 2017 English-German (en2de) corpus as the training set and the newstest2014 dataset as the validation dataset in the machine translation (MT) task.
Hardware Specification	Yes	We implement all models using Tensorﬂow and run the experiments on NVIDIA V100 8GB GPU or NVIDIA V100 32GB GPU.
Software Dependencies	No	The paper states 'We implement all models using Tensorﬂow' but does not provide specific version numbers for TensorFlow or any other software dependencies.
Experiment Setup	Yes	The dimension of the word embedding and the hidden layers are 256 and 512. The number of heads in the self-attention module is 8. The learning rate is chosen from the set {1e 4, 5e 4}. The number of layers in all Transformer models is set to 1. The numbers of training samples are 5000 for Process-50 and 1000 for Reber and Adding-100. The number of testing samples is 5000 for all tasks.