Neural Module Networks for Reasoning over Text

Authors: Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed model significantly outperforms state-of-the-art models on a subset of the DROP dataset that poses a variety of reasoning challenges that are covered by our modules. We experiment on 21,800 questions from the recently proposed DROP dataset (Dua et al., 2019) that are heuristically chosen based on their first n-gram such that they are covered by our designed modules.
Researcher Affiliation Collaboration Nitish Gupta1, Kevin Lin2 , Dan Roth1, Sameer Singh3 & Matt Gardner4 {nitishg,danroth}@seas.upenn.edu, kevinlin@eecs.berkeley.edu, sameer@uci.edu, mattg@allenai.org 1University of Pennsylvania, Philadelphia, 2University of California, Berkeley, 3University of California, Irvine, 4Allen Institute for AI
Pseudocode No The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code Yes Our code is available at http://cogcomp.org/page/publication_view/899.
Open Datasets Yes We perform experiments on a portion of the recently released DROP dataset (Dua et al., 2019), which to the best of our knowledge is the only dataset that requires the kind of compositional and symbolic reasoning that our model aims to solve.
Dataset Splits Yes The dataset we construct contains 20, 000 questions for training/validation, and 1800 questions for testing (25% of DROP).
Hardware Specification No The paper mentions using a "bi-directional GRU" and a "pre-trained BERT model" but does not specify any hardware details such as GPU or CPU models used for training or inference.
Software Dependencies No The paper mentions implementing the model using "Allen NLP" and using "spaCy-NER" and an "off-the-shelf date-parser" but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes The hyperparameters used for our model are described in the appendix. Optmization is performed using the Adam algorithm with a learning rate of 0.001 or using BERT s optimizer with a learning rate of 1e 5.