Tree-Structured Attention with Hierarchical Accumulation

Authors: Xuan-Phi Nguyen, Shafiq Joty, Steven Hoi, Richard Socher

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct our experiments on two types of tasks: Machine Translation and Text Classification. 5.1 NEURAL MACHINE TRANSLATION Setup. We experiment with five translation tasks: IWSLT 14 English-German (En-De), German English (De-En), IWSLT 13 English-French (En-Fr), French-English (Fr-En), and WMT 14 English-German. 5.2 TEXT CLASSIFICATION Setup. We also compare our attention-based tree encoding method with Tree-LSTM (Tai et al., 2015) and other sequence-based baselines on the Stanford Sentiment Analysis (SST) (Socher et al., 2013), IMDB Sentiment Analysis and Subject-Verb Agreement (SVA) (Linzen et al., 2016) tasks.
Researcher Affiliation Collaboration Salesforce Research Nanyang Technological University nguyenxu002@e.ntu.edu.sg,{sjoty,shoi,rsocher}@salesforce.com
Pseudocode No The paper describes the methods using mathematical equations and diagrams (e.g., Figure 1, Figure 4) but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Our source code is available at https://github.com/nxphi47/tree transformer.
Open Datasets Yes We experiment with five translation tasks: IWSLT 14 English-German (En-De), German English (De-En), IWSLT 13 English-French (En-Fr), French-English (Fr-En), and WMT 14 English-German. We also compare our attention-based tree encoding method with Tree-LSTM (Tai et al., 2015) and other sequence-based baselines on the Stanford Sentiment Analysis (SST) (Socher et al., 2013), IMDB Sentiment Analysis and Subject-Verb Agreement (SVA) (Linzen et al., 2016) tasks. We use the Stanford Core NLP parser (v3.9.1)5 (Manning et al., 2014) to parse the datasets.
Dataset Splits Yes For the Stanford Sentiment Analysis task (SST), we tested on two subtasks: binary and fine-grained (5 classes) classification on the standard train/dev/test splits of 6920/872/1821 and 8544/1101/2210 respectively. For subject-verb agreement task, we trained on a set of 142, 000 sentences, validated on the set of 15, 700 sentences and tested on the set of 1, 000, 000 sentences. Meanwhile, the IWSLT English French task has 200, 000 training sentence pairs, we used IWSLT15.TED.tst2012 for validation and IWSLT15.TED.tst2013 for testing.
Hardware Specification No All the models are trained on a sentiment classification task on a single GPU for 1000 iterations with a batchsize of 1.
Software Dependencies Yes We use the Stanford Core NLP parser (v3.9.1)5 (Manning et al., 2014)
Experiment Setup Yes For IWSLT experiments, we trained the base models with d = 512 for 60K updates with a batch size of 4K tokens. For WMT, we used 200K updates and 32K tokens for the base models (d = 512), and 20K updates and 512K tokens for the big models with d = 1024. The models have 2 Transformer layers, 4 heads in each layer, and dimensions d = 64. We trained the models for 15K updates, with a batch size of 2K tokens. Word embeddings are randomly initialized. We used a learning rate of 7 10 4, dropout 0.5 and 8000 warmup steps. For both subject-verb agreement and IMDB sentiment analysis, we trained models with 20 warmup steps, 0.01 learning rate and 0.2 dropout.