Learning values across many orders of magnitude

Authors: Hado P. van Hasselt, Arthur Guez, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We ran the Double DQN algorithm [van Hasselt et al., 2016] in three versions: without changes, without clipping both rewards and temporal difference errors, and without clipping but additionally using Pop-Art.
Researcher Affiliation Industry Google Deep Mind
Pseudocode Yes Algorithm 1 SGD on squared loss with Pop-Art
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets Yes The Arcade Learning Environment (ALE) [Bellemare et al., 2013]
Dataset Splits No The paper mentions '200M frames' for training and evaluating 'on 100 episodes', and for binary regression '5000 samples' and 'Every 1000 samples, we present the binary representation of 2^16 - 1'. However, it does not provide specific training, validation, and test dataset splits with percentages or absolute counts for reproducibility.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory amounts, or cloud instance specifications) used for running experiments are provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are provided.
Experiment Setup Yes We roughly tuned the main step size and the step size for the normalization to 10^-4.