Deep reinforcement learning with relational inductive biases
Authors: Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show this approach can offer advantages in efficiency, generalization, and interpretability, and can scale up to meet some of the most challenging test environments in modern artificial intelligence. |
| Researcher Affiliation | Industry | Deep Mind, London, UK {vzambaldi,draposo,adamsantoro}@google.com |
| Pseudocode | No | The paper describes the agent architecture and algorithms in text and diagrams (e.g., Figure 2) but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'The Box-World environment will be made publicly available online.' This refers to the environment/task, not the source code for the methodology described in the paper. No other explicit statement or link to the paper's source code is provided. |
| Open Datasets | No | For Box-World, the paper states 'The Box-World environment will be made publicly available online' but does not provide a specific link, DOI, or formal citation for a fixed dataset. For StarCraft II, it refers to the 'Star Craft II Learning Environment (SC2LE, Vinyals et al., 2017),' which is an environment and framework, not a static dataset with explicit access information for reproducibility as requested. |
| Dataset Splits | No | The paper mentions a 'training set-up' for Box-World and 'testing on withheld environments,' and describes generalization tests, but does not provide specific percentages or counts for training, validation, and test splits that would allow reproduction of data partitioning. |
| Hardware Specification | No | The paper states 'The model updates were performed on GPU' but does not specify any particular GPU model (e.g., NVIDIA A100), CPU model, or other specific hardware details. |
| Software Dependencies | No | The paper mentions optimizers like 'RMSprop optimiser' and 'Adam optimiser' and states 'implemented in Python' but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., TensorFlow, PyTorch, scikit-learn). |
| Experiment Setup | Yes | The agents used an entropy cost of 0.005, discount (γ) of 0.99 and unroll length of 40 steps. Training was done using RMSprop optimiser with momentum of 0, ϵ of 0.1 and a decay term of 0.99. The learning rate was tuned, taking values between 1e 5 and 2e 4. (Appendix B). For StarCraft II, Table 2 and Table 3 provide detailed fixed hyperparameters (e.g., 'Batch size 32', 'Unroll Length 80', 'Adam β1 0.9', 'Attention embedding size 32'). |