Deep reinforcement learning with relational inductive biases

Authors: Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show this approach can offer advantages in efficiency, generalization, and interpretability, and can scale up to meet some of the most challenging test environments in modern artificial intelligence.
Researcher Affiliation Industry Deep Mind, London, UK {vzambaldi,draposo,adamsantoro}@google.com
Pseudocode No The paper describes the agent architecture and algorithms in text and diagrams (e.g., Figure 2) but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper states 'The Box-World environment will be made publicly available online.' This refers to the environment/task, not the source code for the methodology described in the paper. No other explicit statement or link to the paper's source code is provided.
Open Datasets No For Box-World, the paper states 'The Box-World environment will be made publicly available online' but does not provide a specific link, DOI, or formal citation for a fixed dataset. For StarCraft II, it refers to the 'Star Craft II Learning Environment (SC2LE, Vinyals et al., 2017),' which is an environment and framework, not a static dataset with explicit access information for reproducibility as requested.
Dataset Splits No The paper mentions a 'training set-up' for Box-World and 'testing on withheld environments,' and describes generalization tests, but does not provide specific percentages or counts for training, validation, and test splits that would allow reproduction of data partitioning.
Hardware Specification No The paper states 'The model updates were performed on GPU' but does not specify any particular GPU model (e.g., NVIDIA A100), CPU model, or other specific hardware details.
Software Dependencies No The paper mentions optimizers like 'RMSprop optimiser' and 'Adam optimiser' and states 'implemented in Python' but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., TensorFlow, PyTorch, scikit-learn).
Experiment Setup Yes The agents used an entropy cost of 0.005, discount (γ) of 0.99 and unroll length of 40 steps. Training was done using RMSprop optimiser with momentum of 0, ϵ of 0.1 and a decay term of 0.99. The learning rate was tuned, taking values between 1e 5 and 2e 4. (Appendix B). For StarCraft II, Table 2 and Table 3 provide detailed fixed hyperparameters (e.g., 'Batch size 32', 'Unroll Length 80', 'Adam β1 0.9', 'Attention embedding size 32').