Neural Episodic Control

Authors: Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adrià Puigdomènech Badia, Oriol Vinyals, Demis Hassabis, Daan Wierstra, Charles Blundell

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental in Section 4 we report experimental results in the Atari Learning Environment
Researcher Affiliation Industry 1Deepmind, London, UK. Correspondence to: Alexander Pritzel <apritzel@google.com>.
Pseudocode Yes Algorithm 1 Neural Episodic Control
Open Source Code No The paper mentions 'Videos and complementary graphical material can be found at https://sites.google.com/view/necicml' but does not provide a direct link or explicit statement for the release of source code for the described methodology.
Open Datasets Yes As a problem domain we chose the Atari Learning Environment(ALE; Bellemare et al., 2013). We tested our method on the 57 Atari games used by Schaul et al. (2015a)
Dataset Splits Yes In order to tune the remaining hyperparameters (SGD learning-rate, fast-update learning-rate α in Equation 4, dimensionality of the embeddings, Q(N) in Equation 3, and ϵgreedy exploration-rate) we ran a hyperparameter sweep on six games: Beam Rider, Breakout, Pong, Q*Bert, Seaquest and Space Invaders. We picked the hyperparameter values that performed best on the median for this subset of games (a common cross validation procedure described by Bellemare et al. (2013), and adhered to by Mnih et al. (2015)).
Hardware Specification No The paper does not provide specific details about the hardware used, such as CPU/GPU models or memory specifications.
Software Dependencies No The paper mentions using 'RMSProp algorithm (Tieleman & Hinton, 2012)' and applying 'preprocessing steps as (Mnih et al., 2015)' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes All algorithms were trained using discount rate γ = 0.99, except MFEC that uses γ = 1. ... In terms of hyperparameters for NEC, we chose the same convolutional architecture as DQN, and store up to 5 105 memories per action. We used the RMSProp algorithm ... We apply the same preprocessing steps as (Mnih et al., 2015), including repeating each action four times. For the N-step Q estimates we picked a horizon of N = 100. Our replay buffer stores the only last 105 states ... We do one replay update for every 16 observed frames with a minibatch of size 32. We set the number of nearest neighbours p = 50 in all our experiments. For the kernel function ... We set δ = 10 3.