Systematic Generalization: What Is Required and Can It Be Learned?
Authors: Dzmitry Bahdanau*, Shikhar Murty*, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare both types of models in how much they lend themselves to a particular form of systematic generalization. Using a synthetic VQA test, we evaluate which models are capable of reasoning about all possible object pairs after training on only a small subset of them. Our findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected. We furthermore investigate if modular models that generalize well could be made more end-to-end by learning their layout and parametrization. |
| Researcher Affiliation | Collaboration | Dzmitry Bahdanau Mila, Universit e de Montr eal Adept Mind Scholar Element AI |
| Pseudocode | Yes | Algorithm 1 Pseudocode for creating SQOOP |
| Open Source Code | Yes | The code for all experiments is available online1. 1https://github.com/rizar/systematic-generalization-sqoop |
| Open Datasets | No | The paper describes the synthetic SQOOP dataset and provides pseudocode for its generation, but it does not provide concrete access information (e.g., a link or repository) to a pre-generated or publicly hosted version of the dataset files themselves. |
| Dataset Splits | Yes | Our training sets contain 1 million examples, so for a dataset with #rhs/lhs = k we generate approximately 106/(36 4 k) different images per unique question. ... We continuously monitored validation set performance of all models during training, selected the best one and reported its performance on the test set. |
| Hardware Specification | Yes | We also thank Nvidia for donating NVIDIA DGX-1 used for this research. |
| Software Dependencies | No | The paper mentions using the Adam optimizer with specific hyperparameters but does not list any software libraries or frameworks with their specific version numbers. |
| Experiment Setup | Yes | All models share the same stem architecture which consists of 6 layers of convolution (8 for Relation Networks), batch normalization and max pooling. The input to the stem is a 64 64 3 image, and the feature dimension used throughout the stem is 64. ... In all our experiments we used the Adam optimizer (Kingma & Ba, 2015) with hyperparameters α = 0.0001, β1 = 0.9, β2 = 0.999, ϵ = 10 10. ... The number of training iterations for each model was selected in preliminary investigations based on our observations of how long it takes for different models to converge. This information, as well as other training details, can be found in Table 3. |