Compositional De-Attention Networks

Authors: Yi Tay, Anh Tuan Luu, Aston Zhang, Shuohang Wang, Siu Cheung Hui

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Co DA on six NLP tasks, i.e. open domain question answering, retrieval/ranking, natural language inference, machine translation, sentiment analysis and text2code generation. We obtain promising experimental results, achieving state-of-the-art performance on several tasks/datasets.
Researcher Affiliation Collaboration Yi Tay , Luu Anh Tuan , Aston Zhang, Shuohang Wang, Siu Cheung Hui , Nanyang Technological University, Singapore MIT CSAIL, Amazon AI Microsoft Dynamics 365 AI Research
Pseudocode No The paper includes mathematical equations and descriptions but does not provide a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper links to open-source base models or frameworks used (e.g., Tensor2Tensor, Deca Prop), but does not provide an explicit statement or link to the authors' own implementation code for the Co DA methodology described in this paper.
Open Datasets Yes We use well-established benchmarks, Search QA [Dunn et al., 2017] and Quasar-T [Dhingra et al., 2017]. We use well-established answer retrieval datasets (Trec QA [Wang et al., 2007] and Wiki QA [Yang et al., 2015]) along with response selection dataset (Ubuntu dialogue corpus [Lowe et al., 2015]). We use four datasets, SNLI Bowman et al. [2015], MNLI [Williams et al., 2017], Sci Tail [Khot et al., 2018] and the newly released Dialogue NLI (DNLI) [Welleck et al., 2018]. In our experiments, we use the IWSLT 15 English-Vietnamese dataset.
Dataset Splits No The paper mentions 'development scores' in ablation studies but does not explicitly describe the training/validation/test dataset splits with percentages, sample counts, or clear methodology for partitioning the data for reproducibility.
Hardware Specification Yes We use the transformer_base_single_gpu setting and run the model on a single Titan X GPU for 50K steps and using the default checkpoint averaging script.
Software Dependencies No The paper mentions using 'Tensor2Tensor' and 'Deca Prop' as base models, and states implementation details like 'Tensor2Tensor framework', but does not specify version numbers for these software components or any other libraries like Python, PyTorch, or TensorFlow versions.
Experiment Setup Yes We train all models for 20 epochs, optimizing with Adam with learning rate 0.0003. Hidden dimensions are set to 200 following the original Decomp Att model. Batch size is set to 64. We set the batch size to 32 for Scitail in lieu of a smaller dataset size. We use the transformer_base_single_gpu setting and run the model on a single Titan X GPU for 50K steps and using the default checkpoint averaging script. We train both models with 2000 steps. Following [Wangperawong, 2018], we trained a Co DA Transformer model on this dataset for 100K steps. We train Transformer and Co DA Transformer for 100K steps using the tiny setting.