reproducibilityindex.ai

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

Authors: [code] [data] Jason Weston, Antoine Bordes, Sumit Chopra, Sasha Rush, Bart van Merrienboer, Armand Joulin, Tomas Mikolov

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5 we give benchmark results of standard methods on our tasks, and analyse their successes and failures. We compared the following methods on our tasks (on the English dataset): (i) an Ngram classiﬁer baseline, (ii) LSTMs (long short term memory Recurrent Neural Networks) (Hochreiter & Schmidhuber, 1997), (iii) Memory Networks (Mem NNs) (Weston et al., 2014), (iv) some extensions of Memory Networks we will detail; and (v) a structured SVM that incorporates external labeled data from existing NLP tasks. For each task we use 1000 questions for training, and 1000 for testing, and report the test accuracy.
Researcher Affiliation	Industry	Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merri enboer, Armand Joulin & Tomas Mikolov Facebook AI Research 770 Broadway New York, USA {jase,abordes,spchopra,tmikolov,sashar,bartvm}@fb.com
Pseudocode	No	The paper describes the components and functions of Memory Networks and their extensions using prose and mathematical equations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The tasks are publicly available at http://fb.ai/babi. Source code to generate the tasks is available at https://github.com/facebook/bAbI-tasks.
Open Datasets	Yes	The tasks are publicly available at http://fb.ai/babi. Source code to generate the tasks is available at https://github.com/facebook/bAbI-tasks.
Dataset Splits	Yes	For each task we use 1000 questions for training, and 1000 for testing, and report the test accuracy. We consider a task successfully passed if 95% accuracy is obtained.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory specifications) used to run the experiments.
Software Dependencies	No	The paper mentions software components and tools such as 'Ngram classiﬁer', 'LSTMs', 'Memory Networks', 'structured SVM', 'The Stanford coreference system (Raghunathan et al., 2010)', and 'the SENNA semantic role labeling (SRL) system (Collobert et al., 2011)', but it does not provide specific version numbers for any of these.
Experiment Setup	No	The paper states that 'Learning rates and other hyperparameters for all methods are chosen using the training set,' and describes some model architectural choices (e.g., 'k=2 hops,' 'variable number of hops,' 'bag of 3-grams,' '2-layer neural network with tanh nonlinearity'), but it does not provide specific numerical values for hyperparameters like learning rates, batch sizes, or optimizer settings used in the experiments.