Learning to Sample with Local and Global Contexts in Experience Replay Buffer
Authors: Youngmin Oh, Kimin Lee, Jinwoo Shin, Eunho Yang, Sung Ju Hwang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our framework, which we refer to as Neural Experience Replay Sampler (NERS)1, on multiple benchmark tasks for both continuous and discrete control tasks and show that it can significantly improve the performance of various off-policy RL methods. Our experimental results show that NERS consistently (and often significantly for complex tasks having high-dimensional state and action spaces) outperforms both the existing the rule-based (Schaul et al., 2016) and learning-based (Zha et al., 2019) sampling methods for experience replay. |
| Researcher Affiliation | Collaboration | 1 Samsung Advanced Institute of Technology 2 University of California, Berkeley 3 Korea Advanced Institute of Science and Technology 4 AITRICS |
| Pseudocode | Yes | Algorithm 1 Training NERS: batch size m and sample size n |
| Open Source Code | Yes | 1Code is available at https://github.com/youngmin0oh/NERS |
| Open Datasets | Yes | on the following standard continuous control environments (e.g., Ant-v3, Walker2D-v3, and Hopper-v3) from the Mu Jo Co physics engine (Todorov et al., 2012) and classical and Box2D continuous control tasks (i.e., Pendulum 2, Lunar Lander Continuous-v2, and Bipedal Walker-v3) from Open AI Gym (Brockman et al., 2016). We also consider a subset of the Atari games (Bellemare et al., 2013) to validate the effect of our experience sampler on the discrete control tasks (see Table 2). |
| Dataset Splits | No | The paper discusses training steps and evaluation rollouts during training, but does not specify explicit train/validation/test dataset splits for the environments or data used. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions 'Optimizer Adam Kingma & Ba (2014)' and 'open AI baselines', but does not provide specific version numbers for software dependencies like Python, PyTorch/TensorFlow, or other libraries used for implementation. |
| Experiment Setup | Yes | Table B.1: Hyper-parameters [lists many specific hyperparameters including] Shared Batch size (continuous control environments) 128 Batch size (discrete control environments) 32 Buffer size 10^6 Target smoothing coefficient (τ) for soft update 5 × 10^−3 Initial prioritized experience replay buffer exponents (α, β) 5 (0.5, 0.4) Discount factor for the agent reward (γ) 0.99 Number of initial random actions (continuous control environments) 5 × 10^3 Optimizer Adam Kingma & Ba (2014) Nonlinearity Re LU |