Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks
Authors: Brenden Lake, Marco Baroni
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In each of the following experiments, the recurrent networks are trained on a large set of commands from the SCAN tasks to establish background knowledge as outlined above. After training, the networks are then evaluated on new commands designed to test generalization beyond the background set in systematic, compositional ways. The top-performing network for this experiment achieved 99.8% correct on the test set (accuracy values here and below are averaged over the five training runs). |
| Researcher Affiliation | Collaboration | Brenden Lake 1 2 Marco Baroni 2 1Dept. of Psychology and Center for Data Science, New York University 2Facebook Artificial Intelligence Research. |
| Pseudocode | No | The paper describes the models and setup, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code we used is publicly available at the link: http://pytorch.org/tutorials/intermediate/ seq2seq_translation_tutorial.html |
| Open Datasets | Yes | We call our data set SCAN because it is a Simplified version of the Comm AI Navigation tasks (Mikolov et al., 2016).1 SCAN available at: https://github.com/ brendenlake/SCAN |
| Dataset Splits | No | The paper states 'the SCAN tasks were randomly split into a training set (80%) and a test set (20%)' for Experiment 1, but does not explicitly provide details about a separate validation split (e.g., percentages, sample counts) for model tuning or early stopping. |
| Hardware Specification | No | The paper describes the software implementation and training procedures but does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions 'Py Torch' for implementation and 'ADAM optimization algorithm' but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | A large-scale hyperparameter search was conducted that varied the number of layers (1 or 2), the number of hidden units per layer (25, 50, 100, 200, or 400), and the amount of dropout (0, 0.1, 0.5; applied to recurrent layers and word embeddings)... The ADAM optimization algorithm was used with default parameters, including a learning rate of 0.001 (Kingma & Welling, 2014). Gradients with a norm larger than 5.0 were clipped. |