Multi-Mention Learning for Reading Comprehension with Neural Cascades

Authors: Swabha Swayamdipta, Ankur P. Parikh, Tom Kwiatkowski

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our approach achieves state-of-the-art performance on both the Wikipedia and web domains of the Trivia QA dataset, outperforming more complex, recurrent architectures. Our experimental results ( 4) show that all the above are essential in helping our model achieve state-of-the-art performance.
Researcher Affiliation Collaboration Swabha Swayamdipta Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213, USA swabha@cs.cmu.edu Ankur P. Parikh & Tom Kwiatkowski Google Research New York, NY 10011, USA {aparikh,tomkwiat}@google.com
Pseudocode No The paper describes the model architecture and components using equations and textual descriptions, but it does not include a clearly labeled pseudocode block or algorithm.
Open Source Code No The paper does not provide an explicit statement about releasing its source code or a direct link to a code repository for the methodology described. It mentions using open-source tools like TensorFlow and NLTK, but not its own implementation.
Open Datasets Yes The Trivia QA dataset (Joshi et al., 2017) contains a collection of 95k trivia question-answer pairs from several online trivia sources. We use GloVe embeddings (Pennington et al., 2014) of dimension 300 (trained on a corpus of 840 billion words) that are not updated during training.
Dataset Splits Yes Table 2 shows some ablations that give more insight into the different contributions of our model components. Our final approach (3-Level Cascade, Multiloss) achieves the best performance... on the full Wikipedia development set. Figure 3 (left) shows the behavior of the k-best predictions of different models on the human-validated Wikipedia development set.
Hardware Specification Yes Each hyperparameter setting took 2-3 days to train on a single NVIDIA P100 GPU. ... (both approaches use a P100 GPU).
Software Dependencies No The paper mentions software like NLTK ('All documents are tokenized using the NLTK4 tokenizer.4http://www.nltk.org') and TensorFlow ('The model was implemented in TensorFlow (Abadi et al., 2016)'). However, it does not provide specific version numbers for these software components, which is required for a 'Yes' classification.
Experiment Setup Yes We additionally tuned the following hyperparameters using grid search and indicate the optimal values in parantheses: network size (2-layers, each with 300 neurons), dropout ratio (0.1), learning rate (0.05), context size (1), and loss weights (λ1 = λ2 = 0.35, λ3 = 0.2, λ4 = 0.1). We use Adagrad (Duchi et al., 2011) for optimization (default initial accumulator value set to 0.1, batch size set to 1).