Learning to Sample with Local and Global Contexts in Experience Replay Buffer

Authors: Youngmin Oh, Kimin Lee, Jinwoo Shin, Eunho Yang, Sung Ju Hwang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our framework, which we refer to as Neural Experience Replay Sampler (NERS)1, on multiple benchmark tasks for both continuous and discrete control tasks and show that it can significantly improve the performance of various off-policy RL methods. Our experimental results show that NERS consistently (and often significantly for complex tasks having high-dimensional state and action spaces) outperforms both the existing the rule-based (Schaul et al., 2016) and learning-based (Zha et al., 2019) sampling methods for experience replay.
Researcher Affiliation Collaboration 1 Samsung Advanced Institute of Technology 2 University of California, Berkeley 3 Korea Advanced Institute of Science and Technology 4 AITRICS
Pseudocode Yes Algorithm 1 Training NERS: batch size m and sample size n
Open Source Code Yes 1Code is available at https://github.com/youngmin0oh/NERS
Open Datasets Yes on the following standard continuous control environments (e.g., Ant-v3, Walker2D-v3, and Hopper-v3) from the Mu Jo Co physics engine (Todorov et al., 2012) and classical and Box2D continuous control tasks (i.e., Pendulum 2, Lunar Lander Continuous-v2, and Bipedal Walker-v3) from Open AI Gym (Brockman et al., 2016). We also consider a subset of the Atari games (Bellemare et al., 2013) to validate the effect of our experience sampler on the discrete control tasks (see Table 2).
Dataset Splits No The paper discusses training steps and evaluation rollouts during training, but does not specify explicit train/validation/test dataset splits for the environments or data used.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or cloud instance types used for running experiments.
Software Dependencies No The paper mentions 'Optimizer Adam Kingma & Ba (2014)' and 'open AI baselines', but does not provide specific version numbers for software dependencies like Python, PyTorch/TensorFlow, or other libraries used for implementation.
Experiment Setup Yes Table B.1: Hyper-parameters [lists many specific hyperparameters including] Shared Batch size (continuous control environments) 128 Batch size (discrete control environments) 32 Buffer size 10^6 Target smoothing coefficient (τ) for soft update 5 × 10^−3 Initial prioritized experience replay buffer exponents (α, β) 5 (0.5, 0.4) Discount factor for the agent reward (γ) 0.99 Number of initial random actions (continuous control environments) 5 × 10^3 Optimizer Adam Kingma & Ba (2014) Nonlinearity Re LU