Neural Episodic Control
Authors: Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adrià Puigdomènech Badia, Oriol Vinyals, Demis Hassabis, Daan Wierstra, Charles Blundell
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | in Section 4 we report experimental results in the Atari Learning Environment |
| Researcher Affiliation | Industry | 1Deepmind, London, UK. Correspondence to: Alexander Pritzel <apritzel@google.com>. |
| Pseudocode | Yes | Algorithm 1 Neural Episodic Control |
| Open Source Code | No | The paper mentions 'Videos and complementary graphical material can be found at https://sites.google.com/view/necicml' but does not provide a direct link or explicit statement for the release of source code for the described methodology. |
| Open Datasets | Yes | As a problem domain we chose the Atari Learning Environment(ALE; Bellemare et al., 2013). We tested our method on the 57 Atari games used by Schaul et al. (2015a) |
| Dataset Splits | Yes | In order to tune the remaining hyperparameters (SGD learning-rate, fast-update learning-rate α in Equation 4, dimensionality of the embeddings, Q(N) in Equation 3, and ϵgreedy exploration-rate) we ran a hyperparameter sweep on six games: Beam Rider, Breakout, Pong, Q*Bert, Seaquest and Space Invaders. We picked the hyperparameter values that performed best on the median for this subset of games (a common cross validation procedure described by Bellemare et al. (2013), and adhered to by Mnih et al. (2015)). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as CPU/GPU models or memory specifications. |
| Software Dependencies | No | The paper mentions using 'RMSProp algorithm (Tieleman & Hinton, 2012)' and applying 'preprocessing steps as (Mnih et al., 2015)' but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | All algorithms were trained using discount rate γ = 0.99, except MFEC that uses γ = 1. ... In terms of hyperparameters for NEC, we chose the same convolutional architecture as DQN, and store up to 5 105 memories per action. We used the RMSProp algorithm ... We apply the same preprocessing steps as (Mnih et al., 2015), including repeating each action four times. For the N-step Q estimates we picked a horizon of N = 100. Our replay buffer stores the only last 105 states ... We do one replay update for every 16 observed frames with a minibatch of size 32. We set the number of nearest neighbours p = 50 in all our experiments. For the kernel function ... We set δ = 10 3. |