reproducibilityindex.ai

Intrinsic Motivation for Encouraging Synergistic Behavior

Authors: Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach in robotic bimanual manipulation and multi-agent locomotion tasks with sparse rewards; we ﬁnd that our approach yields more efﬁcient learning than both 1) training with only the sparse reward and 2) using the typical surprise-based formulation of intrinsic motivation, which does not bias toward synergistic behavior. Videos are available on the project webpage: https://sites.google.com/view/iclr2020-synergistic.
Researcher Affiliation	Collaboration	MIT Computer Science and Artiﬁcial Intelligence Laboratory, Facebook Artiﬁcial Intelligence Research ronuchit@mit.edu, shubhtuls@fb.com, saurabhg@illinois.edu, gabhinav@fb.com
Pseudocode	Yes	Full pseudocode is provided in Appendix A.
Open Source Code	No	The paper does not explicitly state that the source code for their methodology is released, nor does it provide a direct link to a code repository for their work. It only links to a project webpage for videos and mentions using third-party libraries like stable baselines.
Open Datasets	No	The paper uses custom simulated robotic and multi-agent locomotion tasks (e.g., bottle opening, ant push, soccer) within MuJoCo. These are custom environments, not publicly available datasets in the traditional sense, and no link or specific access information for pre-generated data is provided.
Dataset Splits	No	The paper evaluates performance based on interactions with simulated environments and does not specify traditional train/validation/test dataset splits with percentages or sample counts, as it generates data dynamically.
Hardware Specification	No	The paper states 'For all tasks, training is parallelized across 50 workers' but does not specify any particular hardware details such as GPU models, CPU models, or memory.
Software Dependencies	No	The paper mentions software like 'MuJoCo', 'Surreal Robotics Suite', and 'stable baselines' but does not provide specific version numbers for any of them.
Experiment Setup	Yes	We set the trade-off coefﬁcient λ = 10 (see Appendix D). We use the stable baselines (Hill et al., 2018) implementation of PPO (Schulman et al., 2017) as our policy gradient algorithm. We use clipping parameter 0.2, entropy loss coefﬁcient 0.01, value loss function coefﬁcient 0.5, gradient clip threshold 0.5, number of steps 10, number of minibatches per update 4, number of optimization epochs per update 4, and Adam (Kingma & Ba, 2015) with learning rate 0.001.