Episodic Reinforcement Learning with Associative Memory

Authors: Guangxiang Zhu*, Zichuan Lin*, Guangwen Yang, Chongjie Zhang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results on the navigation domain and Atari games show our framework achieves significantly higher sample efficiency than state-of-the-art episodic reinforcement learning models.
Researcher Affiliation Academia Guangxiang Zhu1 , Zichuan Lin2 , Guangwen Yang2, Chongjie Zhang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2Department of Computer Science and Technology, Tsinghua University, Beijing, China
Pseudocode Yes Algorithm 1 Value propagation in Associative Memory Algorithm 2 ERLAM: Episodic Reinforcement Learning with Associative Memory
Open Source Code No The paper does not provide any explicit statements or links to open-source code for the described methodology.
Open Datasets Yes We use a video game Monster Kong from Pygame Learning Environment (PLE)(Tasfi, 2016) to set up the navigation experiments. ... To further evaluate the sample efficiency of ERLAM on a diverse set of games, we conduct experiments on the benchmark suite of Atari games from the Arcade Learning Environment (ALE) (Bellemare et al., 2013).
Dataset Splits No The paper mentions training agents for 10 million frames and evaluating for 0.5 million frames at the end of each epoch, but it does not specify distinct training, validation, and test dataset splits in the traditional sense (e.g., percentages or counts for a pre-divided static dataset). The evaluation phase is for testing, not specifically for hyperparameter validation.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions general aspects of the network architecture and training.
Software Dependencies No The paper mentions using the RMSProp algorithm but does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes We follow the same setting for network architecture and all hyper-parameters as DQN (Mnih et al., 2015). The raw images are resized to an 84 84 grayscale image st, and 4 consecutive frames are stacked into one state. The 3 convolutional layers can be indicated as Conv(32,8,4), Conv(64,4,2), and Conv(64,3,1). We used the RMSProp algorithm (Tieleman & Hinton, 2012) with learning rate α = 0.00025 for gradient descent training. The discount factor γ is set to 0.99 for all games. We use annealing ϵ-greedy policies from 1.0 to 0.1 in the training stage while fixing ϵ = 0.05 during evaluation. For hyper-parameters of associative memory, we set the value of λ as 0.1 and associate frequency K as 10 in the navigation domain, Monster Kong. In Atari games, we use the same settings for all games. The value of λ is 0.3, and the associate frequency K is 50. The memory size is set as 1 million. We use random projection technique and project the states into vectors with the dimension of d = 4.