Competitive experience replay
Authors: Hao Liu, Alexander Trott, Richard Socher, Caiming Xiong
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on the tasks of reaching various goal locations in an ant maze and manipulating objects with a robotic arm. Each task provides only binary rewards indicating whether or not the goal is achieved. Our method asymmetrically augments these sparse rewards for a pair of agents each learning the same task, creating a competitive game designed to drive exploration. Extensive experiments demonstrate that this method leads to faster converge and improved task performance. |
| Researcher Affiliation | Industry | Hao Liu , Alexander Trott, Richard Socher, Caiming Xiong Salesforce Research Palo Alto, 94301 lhao499@gmail.com,{atrott, rsocher, cxiong}@salesforce.com |
| Pseudocode | Yes | A ALGORITHM We summarize the algorithm for HER with CER in Algorithm 1. |
| Open Source Code | No | The paper references the code for HER (Andrychowicz et al., 2017) and discusses implementation details based on it, but does not provide an explicit statement or link for the open-sourcing of its own method's code. |
| Open Datasets | Yes | We evaluate the change in performance when ind-CER is added to HER on the challenging multi-goal sparse reward environments introduced in Plappert et al. (2018). (Note: we would prefer to examine int-CER but are prevented by technical issues related to the environment.) |
| Dataset Splits | No | The paper describes training duration in epochs and evaluation of success rates but does not specify explicit train/validation/test dataset splits with percentages or sample counts, which is common for static supervised learning datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions software platforms like Open AI Gym and Mujoco, but it does not provide specific version numbers for these or other software dependencies, such as libraries or frameworks, which would be necessary for reproduction. |
| Experiment Setup | Yes | We summarize the hyperparameters in Table 1. Table 1: Hyperparameter values used in experiments. Hyperparameter U S Fetch Hand name Ant Maze Ant Maze Control Control Buffer size 1E5 1E6 1E6 1E6 Batch size 128 128 256 256 Max steps of 50 100 50 100 episode Reset epochs 2 2 5 5 Max reset epochs 10 10 20 30 Total epochs 50 100 100/50 200 Actor 0.0004 0.0004 0.001 0.001 Learning rate Critic 0.0004 0.0004 0.001 0.001 Learning rate Action 0.01 0.01 1.00 1.00 L2 regularization Polyak 0.95 0.95 0.95 0.95 |