reproducibilityindex.ai

Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks

Authors: Brenden Lake, Marco Baroni

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In each of the following experiments, the recurrent networks are trained on a large set of commands from the SCAN tasks to establish background knowledge as outlined above. After training, the networks are then evaluated on new commands designed to test generalization beyond the background set in systematic, compositional ways. The top-performing network for this experiment achieved 99.8% correct on the test set (accuracy values here and below are averaged over the ﬁve training runs).
Researcher Affiliation	Collaboration	Brenden Lake 1 2 Marco Baroni 2 1Dept. of Psychology and Center for Data Science, New York University 2Facebook Artiﬁcial Intelligence Research.
Pseudocode	No	The paper describes the models and setup, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code we used is publicly available at the link: http://pytorch.org/tutorials/intermediate/ seq2seq_translation_tutorial.html
Open Datasets	Yes	We call our data set SCAN because it is a Simpliﬁed version of the Comm AI Navigation tasks (Mikolov et al., 2016).1 SCAN available at: https://github.com/ brendenlake/SCAN
Dataset Splits	No	The paper states 'the SCAN tasks were randomly split into a training set (80%) and a test set (20%)' for Experiment 1, but does not explicitly provide details about a separate validation split (e.g., percentages, sample counts) for model tuning or early stopping.
Hardware Specification	No	The paper describes the software implementation and training procedures but does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions 'Py Torch' for implementation and 'ADAM optimization algorithm' but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	A large-scale hyperparameter search was conducted that varied the number of layers (1 or 2), the number of hidden units per layer (25, 50, 100, 200, or 400), and the amount of dropout (0, 0.1, 0.5; applied to recurrent layers and word embeddings)... The ADAM optimization algorithm was used with default parameters, including a learning rate of 0.001 (Kingma & Welling, 2014). Gradients with a norm larger than 5.0 were clipped.