Competitive experience replay

Authors: Hao Liu, Alexander Trott, Richard Socher, Caiming Xiong

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on the tasks of reaching various goal locations in an ant maze and manipulating objects with a robotic arm. Each task provides only binary rewards indicating whether or not the goal is achieved. Our method asymmetrically augments these sparse rewards for a pair of agents each learning the same task, creating a competitive game designed to drive exploration. Extensive experiments demonstrate that this method leads to faster converge and improved task performance.
Researcher Affiliation Industry Hao Liu , Alexander Trott, Richard Socher, Caiming Xiong Salesforce Research Palo Alto, 94301 lhao499@gmail.com,{atrott, rsocher, cxiong}@salesforce.com
Pseudocode Yes A ALGORITHM We summarize the algorithm for HER with CER in Algorithm 1.
Open Source Code No The paper references the code for HER (Andrychowicz et al., 2017) and discusses implementation details based on it, but does not provide an explicit statement or link for the open-sourcing of its own method's code.
Open Datasets Yes We evaluate the change in performance when ind-CER is added to HER on the challenging multi-goal sparse reward environments introduced in Plappert et al. (2018). (Note: we would prefer to examine int-CER but are prevented by technical issues related to the environment.)
Dataset Splits No The paper describes training duration in epochs and evaluation of success rates but does not specify explicit train/validation/test dataset splits with percentages or sample counts, which is common for static supervised learning datasets.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or cloud computing specifications.
Software Dependencies No The paper mentions software platforms like Open AI Gym and Mujoco, but it does not provide specific version numbers for these or other software dependencies, such as libraries or frameworks, which would be necessary for reproduction.
Experiment Setup Yes We summarize the hyperparameters in Table 1. Table 1: Hyperparameter values used in experiments. Hyperparameter U S Fetch Hand name Ant Maze Ant Maze Control Control Buffer size 1E5 1E6 1E6 1E6 Batch size 128 128 256 256 Max steps of 50 100 50 100 episode Reset epochs 2 2 5 5 Max reset epochs 10 10 20 30 Total epochs 50 100 100/50 200 Actor 0.0004 0.0004 0.001 0.001 Learning rate Critic 0.0004 0.0004 0.001 0.001 Learning rate Action 0.01 0.01 1.00 1.00 L2 regularization Polyak 0.95 0.95 0.95 0.95