Generalizable Episodic Memory for Deep Reinforcement Learning
Authors: Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, Chongjie Zhang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation shows that our method significantly outperforms existing trajectory-based methods on various Mu Jo Co continuous control tasks. To further show the general applicability, we evaluate our method on Atari games with discrete action space, which also shows a significant improvement over baseline algorithms. 5. Experiments Our experimental evaluation aims to answer the following questions: (1) How well does GEM perform on the continuous state and action space? (2) How well does GEM perform on discrete domains? (3) How effective is each part of GEM? |
| Researcher Affiliation | Academia | 1The Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2Peking University, Beijing, China 3University of Illinois at Urbana Champaign, IL, USA. |
| Pseudocode | Yes | Algorithm 1 Generalizable Episodic Memory and Algorithm 2 Update Memory |
| Open Source Code | Yes | Our implementation of GEM is available at https://github.com/Mouse Hu/GEM. |
| Open Datasets | Yes | We conduct experiments on the suite of Mu Jo Co tasks (Todorov et al., 2012), with Open AI Gym interface (Brockman et al., 2016). We evaluate all the above algorithms on 6 Atari games (Bellemare et al., 2013). |
| Dataset Splits | No | No explicit description of training/test/validation dataset splits (e.g., percentages, sample counts, or specific split files) was found for model validation purposes. |
| Hardware Specification | No | No specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided. |
| Software Dependencies | No | The paper mentions software like OpenAI Gym and various RL algorithms but does not provide specific version numbers for any libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | The memory update frequency u is set to 100 with a smoothing coefficient τ = 0.6. The rest of the hyperparameters are mostly kept the same as in TD3 to ensure a fair comparison. The detailed hyperparameters used are listed in Appendix C. |