reproducibilityindex.ai

Noisy Networks For Exploration

Authors: Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Matteo Hessel, Ian Osband, Alex Graves, Volodymyr Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 57 Atari games show that Noisy Net-DQN and Noisy Net Dueling achieve striking gains when compared to the baseline algorithms without signiﬁcant extra computational cost, and with less hyper parameters to tune.
Researcher Affiliation	Industry	Deep Mind {meirefortunato,mazar,piot, jmenick,mtthss,iosband,gravesa,vmnih, munos,dhcontact,pietquin,cblundell,legg}@google.com
Pseudocode	Yes	Algorithm 1: Noisy Net-DQN / Noisy Net-Dueling, Algorithm 2: Noisy Net-A3C
Open Source Code	No	The paper does not provide any explicit statement or link for open-sourcing the code for the described methodology.
Open Datasets	Yes	We evaluated the performance of noisy network agents on 57 Atari games (Bellemare et al., 2015)
Dataset Splits	No	The paper describes evaluation procedures (e.g., 'evaluating the latest agent for 500K frames') and mentions training, but does not explicitly define distinct train/validation/test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies	No	The paper mentions using specific algorithms and architectures (DQN, Dueling, A3C) but does not provide specific software versions or dependencies (e.g., Python version, PyTorch/TensorFlow versions).
Experiment Setup	Yes	The DQN and A3C agents were training for 200M and 320M frames, respectively. In each case, we used the neural network architecture from the corresponding original papers for both the baseline and Noisy Net variant. For the Noisy Net variants we used the same hyper parameters as in the respective original paper for the baseline. In the case of an unfactorised noisy networks, the parameters µ and σ are initialised as follows. Each element µi,j is sampled from independent uniform distributions U[ q 3 p], where p is the number of inputs to the corresponding linear layer, and each element σi,j is simply set to 0.017 for all parameters. For factorised noisy networks... each element σi,j was initialised to a constant σ0 p. The hyperparameter σ0 is set to 0.5.