reproducibilityindex.ai

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

Authors: Xu-Hui Liu, Zhenghai Xue, Jingcheng Pang, Shengyi Jiang, Feng Xu, Yang Yu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct experiments to evaluate Re MERN and Re MERT3. ... Both methods outperform previous prioritized sampling algorithms in challenging RL benchmarks, including Mu Jo Co, Atari and Meta-World.
Researcher Affiliation	Academia	Xu-Hui Liu , Zhenghai Xue , Jing-Cheng Pang, Shengyi Jiang, Feng Xu, Yang Yu National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China liuxh@lamda.nju.edu.cn, xuezh@smail.nju.edu.cn {pangjc, jiangsy, xufeng}@lamda.nju.edu.cn, yuy@nju.edu.cn
Pseudocode	Yes	The pseudo code for Re MERN is presented in Appendix C. ... Its pseudo code is presented in Appendix C.
Open Source Code	Yes	Codes are available at https://github.com/AIDefender/Re MERN-Re MERT.
Open Datasets	Yes	We ﬁrst compare the performance of Re MERN and Re MERT to prior sampling methods in continuous control benchmarks including Meta-World [24], Mu Jo Co [25] and Deepmind Control Suite (DMC) [26]. We also evaluate our methods in Arcade Learning Environments with discrete action spaces.
Dataset Splits	No	The paper mentions training steps and seeds but does not explicitly describe dataset splits for training, validation, and testing.
Hardware Specification	Yes	All experiments are run on NVIDIA GeForce RTX 3090 GPUs.
Software Dependencies	Yes	All experiments are run on Ubuntu 20.04.2 LTS, CUDA 11.1, PyTorch 1.9.0.
Experiment Setup	Yes	All models are trained for 3M steps. Each task uses 4 seeds. ... The learning rate is set to 3e-4. We use Adam optimizer... The batch size is 256. The replay buffer size is 1M. The training starts after collecting 2000 transitions. The update ratio is 1. For PER, the α is set to 0.6. For DQN, we use the same architecture and hyperparameters as [5] (Nature DQN).