Noisy Networks For Exploration
Authors: Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Matteo Hessel, Ian Osband, Alex Graves, Volodymyr Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on 57 Atari games show that Noisy Net-DQN and Noisy Net Dueling achieve striking gains when compared to the baseline algorithms without significant extra computational cost, and with less hyper parameters to tune. |
| Researcher Affiliation | Industry | Deep Mind {meirefortunato,mazar,piot, jmenick,mtthss,iosband,gravesa,vmnih, munos,dhcontact,pietquin,cblundell,legg}@google.com |
| Pseudocode | Yes | Algorithm 1: Noisy Net-DQN / Noisy Net-Dueling, Algorithm 2: Noisy Net-A3C |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-sourcing the code for the described methodology. |
| Open Datasets | Yes | We evaluated the performance of noisy network agents on 57 Atari games (Bellemare et al., 2015) |
| Dataset Splits | No | The paper describes evaluation procedures (e.g., 'evaluating the latest agent for 500K frames') and mentions training, but does not explicitly define distinct train/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions using specific algorithms and architectures (DQN, Dueling, A3C) but does not provide specific software versions or dependencies (e.g., Python version, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | The DQN and A3C agents were training for 200M and 320M frames, respectively. In each case, we used the neural network architecture from the corresponding original papers for both the baseline and Noisy Net variant. For the Noisy Net variants we used the same hyper parameters as in the respective original paper for the baseline. In the case of an unfactorised noisy networks, the parameters µ and σ are initialised as follows. Each element µi,j is sampled from independent uniform distributions U[ q 3 p], where p is the number of inputs to the corresponding linear layer, and each element σi,j is simply set to 0.017 for all parameters. For factorised noisy networks... each element σi,j was initialised to a constant σ0 p. The hyperparameter σ0 is set to 0.5. |