reproducibilityindex.ai

Generalizable Episodic Memory for Deep Reinforcement Learning

Authors: Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, Chongjie Zhang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation shows that our method signiﬁcantly outperforms existing trajectory-based methods on various Mu Jo Co continuous control tasks. To further show the general applicability, we evaluate our method on Atari games with discrete action space, which also shows a signiﬁcant improvement over baseline algorithms. 5. Experiments Our experimental evaluation aims to answer the following questions: (1) How well does GEM perform on the continuous state and action space? (2) How well does GEM perform on discrete domains? (3) How effective is each part of GEM?
Researcher Affiliation	Academia	1The Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2Peking University, Beijing, China 3University of Illinois at Urbana Champaign, IL, USA.
Pseudocode	Yes	Algorithm 1 Generalizable Episodic Memory and Algorithm 2 Update Memory
Open Source Code	Yes	Our implementation of GEM is available at https://github.com/Mouse Hu/GEM.
Open Datasets	Yes	We conduct experiments on the suite of Mu Jo Co tasks (Todorov et al., 2012), with Open AI Gym interface (Brockman et al., 2016). We evaluate all the above algorithms on 6 Atari games (Bellemare et al., 2013).
Dataset Splits	No	No explicit description of training/test/validation dataset splits (e.g., percentages, sample counts, or specific split files) was found for model validation purposes.
Hardware Specification	No	No specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided.
Software Dependencies	No	The paper mentions software like OpenAI Gym and various RL algorithms but does not provide specific version numbers for any libraries, frameworks, or programming languages used.
Experiment Setup	Yes	The memory update frequency u is set to 100 with a smoothing coefﬁcient τ = 0.6. The rest of the hyperparameters are mostly kept the same as in TD3 to ensure a fair comparison. The detailed hyperparameters used are listed in Appendix C.