Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
Authors: Ekta Sood, Simon Tannert, Philipp Mueller, Andreas Bulling
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On four different corpora we demonstrate that our hybrid TSM duration predictions are highly correlated with human gaze ground truth. We further propose a novel joint modeling approach to integrate TSM predictions into the attention layer of a network designed for a specific upstream NLP task without the need for any task-specific human gaze data. We demonstrate that our joint model outperforms the state of the art in paraphrase generation on the Quora Question Pairs corpus by more than 10% in BLEU-4 and achieves state of the art performance for sentence compression on the challenging Google Sentence Compression corpus. |
| Researcher Affiliation | Academia | Ekta Sood1, Simon Tannert2, Philipp Müller1, Andreas Bulling1 1University of Stuttgart, Institute for Visualization and Interactive Systems (VIS), Germany 2University of Stuttgart, Institute for Natural Language Processing (IMS), Germany |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found. |
| Open Source Code | Yes | Code and other supporting material can be found at https://perceptualui.org/publications/sood20_neurips/ |
| Open Datasets | Yes | For paraphrase generation, we used the Quora Question Pairs corpus3 that consists of human-annotated pairs of paraphrased questions that were crawled from Quora. For the sentence compression task we used the Google Sentence Compression corpus [20] containing 200K sentence compression pairs that were crawled from news articles. For the first step, we run E-Z Reader on the CNN and Daily Mail corpus [28]...in a second training phase we fine-tune the network with real eye tracking data of humans reading from the Provo and Geco corpus [43, 14]. |
| Dataset Splits | Yes | We split the data according to [23, 54], using either 100K or 50K examples for training, 45K examples for validation, and 4K examples for testing. We split the data according to [87], taking the first 1K examples as test data, and the next 1K as validation data. For training we obtain a total of 7.6M annotated sentences on Daily Mail and 3.1M for CNN. For validation, we obtained 850K sentences on Daily Mail and 350K on CNN. We split the data into 10K sentence pairs (pairs means sentence to human, as multiple humans read the same sentence) for train and 1K sentence pairs for validation. We split the data into 65K sentence pairs for train and 8K sentence pairs for validation. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory amounts, or detailed computer specifications) were found for the experimental setup. |
| Software Dependencies | No | No specific ancillary software details with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4) were found. |
| Experiment Setup | Yes | We trained both upstream task models using the ADAM optimizer [35] with a learning rate of 0.0001. For paraphrase generation we used uni-directional GRUs with hidden layer size 1,024 and dropout probability of 0.2. For sentence compression we used Bi LSTMs with hidden layer size 1,024 and dropout probability of 0.1. We used the ADAM optimizer [35] with a learning rate of 0.00001, batch size of 100, and dropout of 0.5 after the embedding layer and the recurrent layer. |