reproducibilityindex.ai

Attending to Entities for Better Text Understanding

Authors: Pengxiang Cheng, Katrin Erk7554-7561

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision conﬁgurations, suggesting future directions on applying similar techniques to other problems.
Researcher Affiliation	Academia	Pengxiang Cheng Department of Computer Science The University of Texas at Austin pxcheng@utexas.edu Katrin Erk Department of Linguistics The University of Texas at Austin katrin.erk@utexas.edu
Pseudocode	No	The paper describes model architectures and methods using text and diagrams (Figure 2, Figure 3), but it does not include any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Our code is available at https: //github.com/pxch/att-ent.
Open Datasets	Yes	The LAMBADA dataset: Word prediction requiring a broad discourse context. In Proceedings of ACL, 1525 1534. (referring to Paperno et al. (2016)) and 'Paperno et al. (2016) introduced the LAMBADA dataset'
Dataset Splits	Yes	TRAIN DEV TEST Size 709,568 4,869 5,153 % Answer-in-context 100% 82.4% 81.7% Filtered by human subjects No Yes Yes (from Table 1) and 'Paperno et al. (2016) divided the Books Corpus randomly into 2 partitions, and only applied the human subjects ﬁltering process to the second half to create the development / test set, while leaving the ﬁrst half raw data untouched to be the training set.'
Hardware Specification	No	The paper mentions support from 'the Texas Advanced Computing Center for providing grid resources' and 'the Chameleon testbed' in the acknowledgments, but it does not specify any particular hardware components such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper states 'We build our models and run all the experiments with Allen NLP (Gardner et al. 2017)' and refers to the 'Stanford Core NLP toolkit (Manning et al. 2014)', but it does not provide specific version numbers for these or any other software components (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	For the baseline BIDAF model, we mostly follow the hyperparameters of the original model: due to space limits, we provide a detailed description of hyper-parameter choices in the supplemental material. [...] For the multi-head self-attention encoders in the BIDAFSA-* variants, we always use 4 attention heads per layer. For BIDAF-SA-EARLY, we include 4 layers in the stacked self-attention encoder [...] For BIDAF-SA-LATE, we only add 1 multi-head self-attention layer