reproducibilityindex.ai

Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention

Authors: Ekta Sood, Simon Tannert, Philipp Mueller, Andreas Bulling

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On four different corpora we demonstrate that our hybrid TSM duration predictions are highly correlated with human gaze ground truth. We further propose a novel joint modeling approach to integrate TSM predictions into the attention layer of a network designed for a speciﬁc upstream NLP task without the need for any task-speciﬁc human gaze data. We demonstrate that our joint model outperforms the state of the art in paraphrase generation on the Quora Question Pairs corpus by more than 10% in BLEU-4 and achieves state of the art performance for sentence compression on the challenging Google Sentence Compression corpus.
Researcher Affiliation	Academia	Ekta Sood1, Simon Tannert2, Philipp Müller1, Andreas Bulling1 1University of Stuttgart, Institute for Visualization and Interactive Systems (VIS), Germany 2University of Stuttgart, Institute for Natural Language Processing (IMS), Germany
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found.
Open Source Code	Yes	Code and other supporting material can be found at https://perceptualui.org/publications/sood20_neurips/
Open Datasets	Yes	For paraphrase generation, we used the Quora Question Pairs corpus3 that consists of human-annotated pairs of paraphrased questions that were crawled from Quora. For the sentence compression task we used the Google Sentence Compression corpus [20] containing 200K sentence compression pairs that were crawled from news articles. For the ﬁrst step, we run E-Z Reader on the CNN and Daily Mail corpus [28]...in a second training phase we ﬁne-tune the network with real eye tracking data of humans reading from the Provo and Geco corpus [43, 14].
Dataset Splits	Yes	We split the data according to [23, 54], using either 100K or 50K examples for training, 45K examples for validation, and 4K examples for testing. We split the data according to [87], taking the ﬁrst 1K examples as test data, and the next 1K as validation data. For training we obtain a total of 7.6M annotated sentences on Daily Mail and 3.1M for CNN. For validation, we obtained 850K sentences on Daily Mail and 350K on CNN. We split the data into 10K sentence pairs (pairs means sentence to human, as multiple humans read the same sentence) for train and 1K sentence pairs for validation. We split the data into 65K sentence pairs for train and 8K sentence pairs for validation.
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory amounts, or detailed computer specifications) were found for the experimental setup.
Software Dependencies	No	No specific ancillary software details with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4) were found.
Experiment Setup	Yes	We trained both upstream task models using the ADAM optimizer [35] with a learning rate of 0.0001. For paraphrase generation we used uni-directional GRUs with hidden layer size 1,024 and dropout probability of 0.2. For sentence compression we used Bi LSTMs with hidden layer size 1,024 and dropout probability of 0.1. We used the ADAM optimizer [35] with a learning rate of 0.00001, batch size of 100, and dropout of 0.5 after the embedding layer and the recurrent layer.