Attending to Entities for Better Text Understanding
Authors: Pengxiang Cheng, Katrin Erk7554-7561
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems. |
| Researcher Affiliation | Academia | Pengxiang Cheng Department of Computer Science The University of Texas at Austin pxcheng@utexas.edu Katrin Erk Department of Linguistics The University of Texas at Austin katrin.erk@utexas.edu |
| Pseudocode | No | The paper describes model architectures and methods using text and diagrams (Figure 2, Figure 3), but it does not include any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Our code is available at https: //github.com/pxch/att-ent. |
| Open Datasets | Yes | The LAMBADA dataset: Word prediction requiring a broad discourse context. In Proceedings of ACL, 1525 1534. (referring to Paperno et al. (2016)) and 'Paperno et al. (2016) introduced the LAMBADA dataset' |
| Dataset Splits | Yes | TRAIN DEV TEST Size 709,568 4,869 5,153 % Answer-in-context 100% 82.4% 81.7% Filtered by human subjects No Yes Yes (from Table 1) and 'Paperno et al. (2016) divided the Books Corpus randomly into 2 partitions, and only applied the human subjects filtering process to the second half to create the development / test set, while leaving the first half raw data untouched to be the training set.' |
| Hardware Specification | No | The paper mentions support from 'the Texas Advanced Computing Center for providing grid resources' and 'the Chameleon testbed' in the acknowledgments, but it does not specify any particular hardware components such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper states 'We build our models and run all the experiments with Allen NLP (Gardner et al. 2017)' and refers to the 'Stanford Core NLP toolkit (Manning et al. 2014)', but it does not provide specific version numbers for these or any other software components (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | For the baseline BIDAF model, we mostly follow the hyperparameters of the original model: due to space limits, we provide a detailed description of hyper-parameter choices in the supplemental material. [...] For the multi-head self-attention encoders in the BIDAFSA-* variants, we always use 4 attention heads per layer. For BIDAF-SA-EARLY, we include 4 layers in the stacked self-attention encoder [...] For BIDAF-SA-LATE, we only add 1 multi-head self-attention layer |