reproducibilityindex.ai

Tree-Structured Attention with Hierarchical Accumulation

Authors: Xuan-Phi Nguyen, Shafiq Joty, Steven Hoi, Richard Socher

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct our experiments on two types of tasks: Machine Translation and Text Classiﬁcation. 5.1 NEURAL MACHINE TRANSLATION Setup. We experiment with ﬁve translation tasks: IWSLT 14 English-German (En-De), German English (De-En), IWSLT 13 English-French (En-Fr), French-English (Fr-En), and WMT 14 English-German. 5.2 TEXT CLASSIFICATION Setup. We also compare our attention-based tree encoding method with Tree-LSTM (Tai et al., 2015) and other sequence-based baselines on the Stanford Sentiment Analysis (SST) (Socher et al., 2013), IMDB Sentiment Analysis and Subject-Verb Agreement (SVA) (Linzen et al., 2016) tasks.
Researcher Affiliation	Collaboration	Salesforce Research Nanyang Technological University nguyenxu002@e.ntu.edu.sg,{sjoty,shoi,rsocher}@salesforce.com
Pseudocode	No	The paper describes the methods using mathematical equations and diagrams (e.g., Figure 1, Figure 4) but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Our source code is available at https://github.com/nxphi47/tree transformer.
Open Datasets	Yes	We experiment with ﬁve translation tasks: IWSLT 14 English-German (En-De), German English (De-En), IWSLT 13 English-French (En-Fr), French-English (Fr-En), and WMT 14 English-German. We also compare our attention-based tree encoding method with Tree-LSTM (Tai et al., 2015) and other sequence-based baselines on the Stanford Sentiment Analysis (SST) (Socher et al., 2013), IMDB Sentiment Analysis and Subject-Verb Agreement (SVA) (Linzen et al., 2016) tasks. We use the Stanford Core NLP parser (v3.9.1)5 (Manning et al., 2014) to parse the datasets.
Dataset Splits	Yes	For the Stanford Sentiment Analysis task (SST), we tested on two subtasks: binary and ﬁne-grained (5 classes) classiﬁcation on the standard train/dev/test splits of 6920/872/1821 and 8544/1101/2210 respectively. For subject-verb agreement task, we trained on a set of 142, 000 sentences, validated on the set of 15, 700 sentences and tested on the set of 1, 000, 000 sentences. Meanwhile, the IWSLT English French task has 200, 000 training sentence pairs, we used IWSLT15.TED.tst2012 for validation and IWSLT15.TED.tst2013 for testing.
Hardware Specification	No	All the models are trained on a sentiment classiﬁcation task on a single GPU for 1000 iterations with a batchsize of 1.
Software Dependencies	Yes	We use the Stanford Core NLP parser (v3.9.1)5 (Manning et al., 2014)
Experiment Setup	Yes	For IWSLT experiments, we trained the base models with d = 512 for 60K updates with a batch size of 4K tokens. For WMT, we used 200K updates and 32K tokens for the base models (d = 512), and 20K updates and 512K tokens for the big models with d = 1024. The models have 2 Transformer layers, 4 heads in each layer, and dimensions d = 64. We trained the models for 15K updates, with a batch size of 2K tokens. Word embeddings are randomly initialized. We used a learning rate of 7 10 4, dropout 0.5 and 8000 warmup steps. For both subject-verb agreement and IMDB sentiment analysis, we trained models with 20 warmup steps, 0.01 learning rate and 0.2 dropout.