Learning values across many orders of magnitude
Authors: Hado P. van Hasselt, Arthur Guez, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We ran the Double DQN algorithm [van Hasselt et al., 2016] in three versions: without changes, without clipping both rewards and temporal difference errors, and without clipping but additionally using Pop-Art. |
| Researcher Affiliation | Industry | Google Deep Mind |
| Pseudocode | Yes | Algorithm 1 SGD on squared loss with Pop-Art |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | Yes | The Arcade Learning Environment (ALE) [Bellemare et al., 2013] |
| Dataset Splits | No | The paper mentions '200M frames' for training and evaluating 'on 100 episodes', and for binary regression '5000 samples' and 'Every 1000 samples, we present the binary representation of 2^16 - 1'. However, it does not provide specific training, validation, and test dataset splits with percentages or absolute counts for reproducibility. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory amounts, or cloud instance specifications) used for running experiments are provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are provided. |
| Experiment Setup | Yes | We roughly tuned the main step size and the step size for the normalization to 10^-4. |