Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks

Authors: Brenden Lake, Marco Baroni

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In each of the following experiments, the recurrent networks are trained on a large set of commands from the SCAN tasks to establish background knowledge as outlined above. After training, the networks are then evaluated on new commands designed to test generalization beyond the background set in systematic, compositional ways. The top-performing network for this experiment achieved 99.8% correct on the test set (accuracy values here and below are averaged over the five training runs).
Researcher Affiliation Collaboration Brenden Lake 1 2 Marco Baroni 2 1Dept. of Psychology and Center for Data Science, New York University 2Facebook Artificial Intelligence Research.
Pseudocode No The paper describes the models and setup, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The code we used is publicly available at the link: http://pytorch.org/tutorials/intermediate/ seq2seq_translation_tutorial.html
Open Datasets Yes We call our data set SCAN because it is a Simplified version of the Comm AI Navigation tasks (Mikolov et al., 2016).1 SCAN available at: https://github.com/ brendenlake/SCAN
Dataset Splits No The paper states 'the SCAN tasks were randomly split into a training set (80%) and a test set (20%)' for Experiment 1, but does not explicitly provide details about a separate validation split (e.g., percentages, sample counts) for model tuning or early stopping.
Hardware Specification No The paper describes the software implementation and training procedures but does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instance types).
Software Dependencies No The paper mentions 'Py Torch' for implementation and 'ADAM optimization algorithm' but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes A large-scale hyperparameter search was conducted that varied the number of layers (1 or 2), the number of hidden units per layer (25, 50, 100, 200, or 400), and the amount of dropout (0, 0.1, 0.5; applied to recurrent layers and word embeddings)... The ADAM optimization algorithm was used with default parameters, including a learning rate of 0.001 (Kingma & Welling, 2014). Gradients with a norm larger than 5.0 were clipped.