Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Authors: [code] [data] Jason Weston, Antoine Bordes, Sumit Chopra, Sasha Rush, Bart van Merrienboer, Armand Joulin, Tomas Mikolov
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5 we give benchmark results of standard methods on our tasks, and analyse their successes and failures. We compared the following methods on our tasks (on the English dataset): (i) an Ngram classifier baseline, (ii) LSTMs (long short term memory Recurrent Neural Networks) (Hochreiter & Schmidhuber, 1997), (iii) Memory Networks (Mem NNs) (Weston et al., 2014), (iv) some extensions of Memory Networks we will detail; and (v) a structured SVM that incorporates external labeled data from existing NLP tasks. For each task we use 1000 questions for training, and 1000 for testing, and report the test accuracy. |
| Researcher Affiliation | Industry | Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merri enboer, Armand Joulin & Tomas Mikolov Facebook AI Research 770 Broadway New York, USA {jase,abordes,spchopra,tmikolov,sashar,bartvm}@fb.com |
| Pseudocode | No | The paper describes the components and functions of Memory Networks and their extensions using prose and mathematical equations, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The tasks are publicly available at http://fb.ai/babi. Source code to generate the tasks is available at https://github.com/facebook/bAbI-tasks. |
| Open Datasets | Yes | The tasks are publicly available at http://fb.ai/babi. Source code to generate the tasks is available at https://github.com/facebook/bAbI-tasks. |
| Dataset Splits | Yes | For each task we use 1000 questions for training, and 1000 for testing, and report the test accuracy. We consider a task successfully passed if 95% accuracy is obtained. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory specifications) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components and tools such as 'Ngram classifier', 'LSTMs', 'Memory Networks', 'structured SVM', 'The Stanford coreference system (Raghunathan et al., 2010)', and 'the SENNA semantic role labeling (SRL) system (Collobert et al., 2011)', but it does not provide specific version numbers for any of these. |
| Experiment Setup | No | The paper states that 'Learning rates and other hyperparameters for all methods are chosen using the training set,' and describes some model architectural choices (e.g., 'k=2 hops,' 'variable number of hops,' 'bag of 3-grams,' '2-layer neural network with tanh nonlinearity'), but it does not provide specific numerical values for hyperparameters like learning rates, batch sizes, or optimizer settings used in the experiments. |