reproducibilityindex.ai

Learning to Sample with Local and Global Contexts in Experience Replay Buffer

Authors: Youngmin Oh, Kimin Lee, Jinwoo Shin, Eunho Yang, Sung Ju Hwang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our framework, which we refer to as Neural Experience Replay Sampler (NERS)1, on multiple benchmark tasks for both continuous and discrete control tasks and show that it can signiﬁcantly improve the performance of various off-policy RL methods. Our experimental results show that NERS consistently (and often signiﬁcantly for complex tasks having high-dimensional state and action spaces) outperforms both the existing the rule-based (Schaul et al., 2016) and learning-based (Zha et al., 2019) sampling methods for experience replay.
Researcher Affiliation	Collaboration	1 Samsung Advanced Institute of Technology 2 University of California, Berkeley 3 Korea Advanced Institute of Science and Technology 4 AITRICS
Pseudocode	Yes	Algorithm 1 Training NERS: batch size m and sample size n
Open Source Code	Yes	1Code is available at https://github.com/youngmin0oh/NERS
Open Datasets	Yes	on the following standard continuous control environments (e.g., Ant-v3, Walker2D-v3, and Hopper-v3) from the Mu Jo Co physics engine (Todorov et al., 2012) and classical and Box2D continuous control tasks (i.e., Pendulum 2, Lunar Lander Continuous-v2, and Bipedal Walker-v3) from Open AI Gym (Brockman et al., 2016). We also consider a subset of the Atari games (Bellemare et al., 2013) to validate the effect of our experience sampler on the discrete control tasks (see Table 2).
Dataset Splits	No	The paper discusses training steps and evaluation rollouts during training, but does not specify explicit train/validation/test dataset splits for the environments or data used.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or cloud instance types used for running experiments.
Software Dependencies	No	The paper mentions 'Optimizer Adam Kingma & Ba (2014)' and 'open AI baselines', but does not provide specific version numbers for software dependencies like Python, PyTorch/TensorFlow, or other libraries used for implementation.
Experiment Setup	Yes	Table B.1: Hyper-parameters [lists many specific hyperparameters including] Shared Batch size (continuous control environments) 128 Batch size (discrete control environments) 32 Buffer size 10^6 Target smoothing coefﬁcient (τ) for soft update 5 × 10^−3 Initial prioritized experience replay buffer exponents (α, β) 5 (0.5, 0.4) Discount factor for the agent reward (γ) 0.99 Number of initial random actions (continuous control environments) 5 × 10^3 Optimizer Adam Kingma & Ba (2014) Nonlinearity Re LU