Environmental drivers of systematicity and generalization in a situated agent
Authors: Felix Hill, Andrew Lampinen, Rosalia Schneider, Stephen Clark, Matthew Botvinick, James L. McClelland, Adam Santoro
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparatively generic agent architecture that exhibits strong performance on these tests. We then identify three aspects of the training regime and environment that make a significant difference to its performance: (a) the number of object/word experiences in the training set; (b) the visual invariances afforded by the agent s perspective, or frame of reference; and (c) the variety of visual input inherent in the perceptual aspect of the agent s perception. Our findings indicate that the degree of generalisation that networks exhibit can depend critically on particulars of the environment in which a given task is instantiated. In order to better understand this generalisation, we conduct several experiments to isolate its contributing factors. |
| Researcher Affiliation | Collaboration | Felix Hill1, Andrew Lampinen3 , Rosalia Schneider1, Stephen Clark1 Matthew Botvinick1,2, James L. Mc Clelland1,3 & Adam Santoro1 1 Deep Mind, London 2 Gatsby Computational Neuroscience Unit, University College London 3 Dept. of Psychology, Stanford University |
| Pseudocode | No | The agent architecture and training algorithm are described in text, for example in Section 2 'A MINIMAL MULTI-MODAL AGENT' and its subsections, but no formal pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper does not include any links to code repositories or statements about the public release of their source code. |
| Open Datasets | No | The experiments were conducted in simulated environments: '3D Unity simulated room' and '3D Deep Mind-Lab environment (Beattie et al., 2016)', rather than on a pre-existing, publicly accessible dataset with a specific link or citation. The citation to Beattie et al. (2016) is for the environment, not a dataset within it. |
| Dataset Splits | No | The paper refers to 'training instructions' and 'test' performance in various sections and tables (e.g., Table 1, Table 2, Table 3) but does not explicitly define or specify a 'validation' dataset split or percentage. |
| Hardware Specification | No | The paper describes the agent architecture and environment, but does not specify any hardware details (e.g., GPU models, CPU types, or cloud computing resources) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components such as 'Unity game engine' and refers to frameworks like 'DeepMind Lab' and algorithms from other papers (e.g., 'Espeholt et al., 2018'), but it does not specify exact version numbers for any software dependencies. |
| Experiment Setup | Yes | The agent visual processor is a residual convolutional network with 64, 64, 32 channels in the first, second and third layers respectively and 2 residual blocks in each layer. Language instructions are received at every timestep as a string. The agent splits these on whitespace and processes them with a (word-level) LSTM network with hidden state size 128. We train the agent using an importance-weighted actor-critic algorithm with a central learner and distributed actors (Espeholt et al., 2018). |