Tell me why! Explanations support learning relational and causal structure
Authors: Andrew K Lampinen, Nicholas Roy, Ishita Dasgupta, Stephanie Cy Chan, Allison Tam, James Mcclelland, Chen Yan, Adam Santoro, Neil C Rabinowitz, Jane Wang, Felix Hill
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we show that language can play a similar role for deep RL agents in complex environments. While agents typically struggle to acquire relational and causal knowledge, augmenting their experience by training them to predict language descriptions and explanations can overcome these limitations. We show that language can help agents learn challenging relational tasks, and examine which aspects of language contribute to its beneļ¬ts. We then show that explanations can help agents to infer not only relational but also causal structure. |
| Researcher Affiliation | Industry | 1Deep Mind, London, UK. Correspondence to: Andrew Lampinen <lampinen@deepmind.com>. |
| Pseudocode | No | The paper describes the agent architecture and training process in text and diagrams, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | We are in the process of preparing our 2D environments for release; once this process is complete they will be released at https://github.com/deepmind/tell_me_why_explanations_rl |
| Open Datasets | No | The paper describes the creation of custom 2D and 3D RL environments for the tasks: "We instantiate these tasks in 2D and 3D RL environments (Fig. 3a)". It does not use or provide access information for a publicly available, pre-existing dataset. |
| Dataset Splits | No | The paper describes training and evaluation setups for its RL agents, including a "training and testing setup" and a "meta-learning setting where agents complete episodes composed of four odd-one-out trials". However, it does not provide specific percentages or counts for traditional training, validation, or test dataset splits in the manner of supervised learning. |
| Hardware Specification | Yes | All agents were implemented using JAX (Bradbury et al., 2018) and Haiku (Hennigan et al., 2020), and were trained using TPU v3 and v4 devices. |
| Software Dependencies | Yes | All agents were implemented using JAX (Bradbury et al., 2018) and Haiku (Hennigan et al., 2020) |
| Experiment Setup | Yes | In Table 2 we list the architectural and hyperparameters used for the main experiments. |