reproducibilityindex.ai

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

Authors: Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The following experiments explore our self-play approach on a variety of tasks, both continuous and discrete, from the Mazebase (Sukhbaatar et al., 2015), RLLab (Duan et al., 2016), and Star Craft (Synnaeve et al., 2016) environments. The same protocol is used in all settings: self-play and target task episodes are mixed together and used to train the agent via discrete policy gradient.
Researcher Affiliation	Collaboration	Sainbayar Sukhbaatar Dept. of Computer Science New York University sainbar@cs.nyu.edu Zeming Lin Facebook AI Research New York zlin@fb.com Ilya Kostrikov Dept. of Computer Science New York University kostrikov@cs.nyu.edu Gabriel Synnaeve, Arthur Szlam & Rob Fergus Facebook AI Research New York {gab,aszlam,robfergus}@fb.com
Pseudocode	Yes	Algorithm 1 Pseudo code for training an agent on a self-play episode; Algorithm 2 Pseudo code for training an agent on a target task episode
Open Source Code	Yes	Code for our approach can be found at (link removed for anonymity).
Open Datasets	Yes	The following experiments explore our self-play approach on a variety of tasks, both continuous and discrete, from the Mazebase (Sukhbaatar et al., 2015), RLLab (Duan et al., 2016), and Star Craft (Synnaeve et al., 2016) environments.
Dataset Splits	No	The paper describes mixing 'self-play episodes' and 'target task episodes' for training (e.g., '25% comes from target task episodes, while the remaining 75% is from self-play'), but it does not specify traditional train/validation/test dataset splits from fixed datasets, as the environments are often randomly generated per episode.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions several software/libraries like Mujoco, RMSProp, TRPO, but it does not provide specific version numbers for these or other ancillary software components.
Experiment Setup	Yes	For the experiments with neural networks, all parameters are randomly initialized from N(0, 0.2). The Hyperparameters of RMSProp are set to 0.97 and 1e-6. The other hyperparameter values used in the experiments are shown in Table 1.