Episodic Reinforcement Learning with Associative Memory
Authors: Guangxiang Zhu*, Zichuan Lin*, Guangwen Yang, Chongjie Zhang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results on the navigation domain and Atari games show our framework achieves significantly higher sample efficiency than state-of-the-art episodic reinforcement learning models. |
| Researcher Affiliation | Academia | Guangxiang Zhu1 , Zichuan Lin2 , Guangwen Yang2, Chongjie Zhang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2Department of Computer Science and Technology, Tsinghua University, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Value propagation in Associative Memory Algorithm 2 ERLAM: Episodic Reinforcement Learning with Associative Memory |
| Open Source Code | No | The paper does not provide any explicit statements or links to open-source code for the described methodology. |
| Open Datasets | Yes | We use a video game Monster Kong from Pygame Learning Environment (PLE)(Tasfi, 2016) to set up the navigation experiments. ... To further evaluate the sample efficiency of ERLAM on a diverse set of games, we conduct experiments on the benchmark suite of Atari games from the Arcade Learning Environment (ALE) (Bellemare et al., 2013). |
| Dataset Splits | No | The paper mentions training agents for 10 million frames and evaluating for 0.5 million frames at the end of each epoch, but it does not specify distinct training, validation, and test dataset splits in the traditional sense (e.g., percentages or counts for a pre-divided static dataset). The evaluation phase is for testing, not specifically for hyperparameter validation. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions general aspects of the network architecture and training. |
| Software Dependencies | No | The paper mentions using the RMSProp algorithm but does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | We follow the same setting for network architecture and all hyper-parameters as DQN (Mnih et al., 2015). The raw images are resized to an 84 84 grayscale image st, and 4 consecutive frames are stacked into one state. The 3 convolutional layers can be indicated as Conv(32,8,4), Conv(64,4,2), and Conv(64,3,1). We used the RMSProp algorithm (Tieleman & Hinton, 2012) with learning rate α = 0.00025 for gradient descent training. The discount factor γ is set to 0.99 for all games. We use annealing ϵ-greedy policies from 1.0 to 0.1 in the training stage while fixing ϵ = 0.05 during evaluation. For hyper-parameters of associative memory, we set the value of λ as 0.1 and associate frequency K as 10 in the navigation domain, Monster Kong. In Atari games, we use the same settings for all games. The value of λ is 0.3, and the associate frequency K is 50. The memory size is set as 1 million. We use random projection technique and project the states into vectors with the dimension of d = 4. |