Experience Replay Optimization
Authors: Daochen Zha, Kwei-Herng Lai, Kaixiong Zhou, Xia Hu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The conducted experiments on various continuous control tasks demonstrate the effectiveness of ERO, empirically showing promise in experience replay learning to improve the performance of off-policy reinforcement learning algorithms. |
| Researcher Affiliation | Academia | Daochen Zha , Kwei-Herng Lai , Kaixiong Zhou and Xia Hu Department of Computer Science and Engineering, Texas A&M University {daochen.zha, khlai037, zkxiong, xiahu}@tamu.edu |
| Pseudocode | Yes | Algorithm 1 ERO enhanced DDPG and Algorithm 2 Update Replay Policy are provided. |
| Open Source Code | No | The paper states, 'Our implementations are based on Open AI DDPG baseline 4,' and footnote 4 provides a GitHub link (https://github.com/openai/baselines). However, this refers to a third-party baseline they used, not their own open-sourced code for the methodology described in this paper. |
| Open Datasets | Yes | Our experiments are conducted on the following continuous control tasks from Open AI Gym3: Half Cheetah-v2, Inverted Double Pendulum-v2, Hopper-v2, Inverted Pendulum-v2, Humanoid Standup-v2, Reacherv2, Humanoid-v2, Pendulum-v0 [Todorov et al., 2012; Brockman et al., 2016]. |
| Dataset Splits | No | The paper does not explicitly provide training, validation, or test dataset splits. It only mentions the environments used and that 'Each task is run for 5 times with 2 x 10^6 timesteps using different random seeds'. |
| Hardware Specification | Yes | Our experiments are performed on a server with 24 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.2GHz processors and 4 Ge Force GTX-1080 Ti 12 GB GPU. |
| Software Dependencies | No | The paper mentions using 'Open AI DDPG baseline' and 'Adam optimizer' but does not specify exact version numbers for these or other software components. |
| Experiment Setup | Yes | Specifically, τ = 0.001 is used for soft target updates, learning rates of 10^-4 and 10^-3 are adopted for actor and critic respectively, the Ornstein-Uhlenbeck noise with θ = 0.15 and σ = 0.2 is used for exploration, the mini-batch size is 64, the replay buffer size is 10^6, the number of rollout steps is 100, and the number of training steps is 50. For our ERO...The number of replay updating steps is set to 1 with minibatch size 64. Adam optimizer is used with a learning rate of 10^-4. |