Generalizable Episodic Memory for Deep Reinforcement Learning

Authors: Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, Chongjie Zhang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation shows that our method significantly outperforms existing trajectory-based methods on various Mu Jo Co continuous control tasks. To further show the general applicability, we evaluate our method on Atari games with discrete action space, which also shows a significant improvement over baseline algorithms. 5. Experiments Our experimental evaluation aims to answer the following questions: (1) How well does GEM perform on the continuous state and action space? (2) How well does GEM perform on discrete domains? (3) How effective is each part of GEM?
Researcher Affiliation Academia 1The Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2Peking University, Beijing, China 3University of Illinois at Urbana Champaign, IL, USA.
Pseudocode Yes Algorithm 1 Generalizable Episodic Memory and Algorithm 2 Update Memory
Open Source Code Yes Our implementation of GEM is available at https://github.com/Mouse Hu/GEM.
Open Datasets Yes We conduct experiments on the suite of Mu Jo Co tasks (Todorov et al., 2012), with Open AI Gym interface (Brockman et al., 2016). We evaluate all the above algorithms on 6 Atari games (Bellemare et al., 2013).
Dataset Splits No No explicit description of training/test/validation dataset splits (e.g., percentages, sample counts, or specific split files) was found for model validation purposes.
Hardware Specification No No specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided.
Software Dependencies No The paper mentions software like OpenAI Gym and various RL algorithms but does not provide specific version numbers for any libraries, frameworks, or programming languages used.
Experiment Setup Yes The memory update frequency u is set to 100 with a smoothing coefficient τ = 0.6. The rest of the hyperparameters are mostly kept the same as in TD3 to ensure a fair comparison. The detailed hyperparameters used are listed in Appendix C.