Tree-Structured Attention with Hierarchical Accumulation
Authors: Xuan-Phi Nguyen, Shafiq Joty, Steven Hoi, Richard Socher
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct our experiments on two types of tasks: Machine Translation and Text Classification. 5.1 NEURAL MACHINE TRANSLATION Setup. We experiment with five translation tasks: IWSLT 14 English-German (En-De), German English (De-En), IWSLT 13 English-French (En-Fr), French-English (Fr-En), and WMT 14 English-German. 5.2 TEXT CLASSIFICATION Setup. We also compare our attention-based tree encoding method with Tree-LSTM (Tai et al., 2015) and other sequence-based baselines on the Stanford Sentiment Analysis (SST) (Socher et al., 2013), IMDB Sentiment Analysis and Subject-Verb Agreement (SVA) (Linzen et al., 2016) tasks. |
| Researcher Affiliation | Collaboration | Salesforce Research Nanyang Technological University nguyenxu002@e.ntu.edu.sg,{sjoty,shoi,rsocher}@salesforce.com |
| Pseudocode | No | The paper describes the methods using mathematical equations and diagrams (e.g., Figure 1, Figure 4) but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code is available at https://github.com/nxphi47/tree transformer. |
| Open Datasets | Yes | We experiment with five translation tasks: IWSLT 14 English-German (En-De), German English (De-En), IWSLT 13 English-French (En-Fr), French-English (Fr-En), and WMT 14 English-German. We also compare our attention-based tree encoding method with Tree-LSTM (Tai et al., 2015) and other sequence-based baselines on the Stanford Sentiment Analysis (SST) (Socher et al., 2013), IMDB Sentiment Analysis and Subject-Verb Agreement (SVA) (Linzen et al., 2016) tasks. We use the Stanford Core NLP parser (v3.9.1)5 (Manning et al., 2014) to parse the datasets. |
| Dataset Splits | Yes | For the Stanford Sentiment Analysis task (SST), we tested on two subtasks: binary and fine-grained (5 classes) classification on the standard train/dev/test splits of 6920/872/1821 and 8544/1101/2210 respectively. For subject-verb agreement task, we trained on a set of 142, 000 sentences, validated on the set of 15, 700 sentences and tested on the set of 1, 000, 000 sentences. Meanwhile, the IWSLT English French task has 200, 000 training sentence pairs, we used IWSLT15.TED.tst2012 for validation and IWSLT15.TED.tst2013 for testing. |
| Hardware Specification | No | All the models are trained on a sentiment classification task on a single GPU for 1000 iterations with a batchsize of 1. |
| Software Dependencies | Yes | We use the Stanford Core NLP parser (v3.9.1)5 (Manning et al., 2014) |
| Experiment Setup | Yes | For IWSLT experiments, we trained the base models with d = 512 for 60K updates with a batch size of 4K tokens. For WMT, we used 200K updates and 32K tokens for the base models (d = 512), and 20K updates and 512K tokens for the big models with d = 1024. The models have 2 Transformer layers, 4 heads in each layer, and dimensions d = 64. We trained the models for 15K updates, with a batch size of 2K tokens. Word embeddings are randomly initialized. We used a learning rate of 7 10 4, dropout 0.5 and 8000 warmup steps. For both subject-verb agreement and IMDB sentiment analysis, we trained models with 20 warmup steps, 0.01 learning rate and 0.2 dropout. |