reproducibilityindex.ai

Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective

Authors: Florin Gogianu, Tudor Berariu, Mihaela C Rosca, Claudia Clopath, Lucian Busoniu, Razvan Pascanu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct ablation studies to disentangle the various effects normalisation has on the learning dynamics and show that is sufficient to modulate the parameter updates to recover most of the performance of spectral normalisation. These findings hint towards the need to also focus on the neural component and its learning dynamics to tackle the peculiarities of Deep Reinforcement Learning.
Researcher Affiliation	Collaboration	1Bitdefender, Bucharest, Romania 2Department of Automation, Technical University of Cluj-Napoca, Romania 3Imperial College London, Department of Bioengineering, London, UK 4Deep Mind, London, UK 5Centre for Artificial Intelligence, University College London, London, UK.
Pseudocode	Yes	The algorithm is sketched below with u and v being the right, and left singural vectors. v Wu(t 1); α v ; v(t) α 1v u W v(t); ρ u ; u(t) ρ 1u
Open Source Code	Yes	We additionally provide code to ensure reproducibility.1 https://github.com/floringogianu/snrl
Open Datasets	Yes	We evaluate SN on the Arcade Learning Environment (ALE) (Bellemare et al., 2013). This collection of Atari games is varied and complex enough to ensure the generality of our claims and observations. ... all the experiments in this section if not mentioned otherwise are using for evaluation the Min Atar environment (Young & Tian, 2019)
Dataset Splits	No	The paper does not explicitly provide training/validation/test splits, exact percentages, or sample counts. It refers to an evaluation protocol, but the specifics of data partitioning are not detailed in the provided text.
Hardware Specification	No	The paper does not contain any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cloud instance specifications.
Software Dependencies	No	The paper mentions software like 'Dopamine (Castro et al., 2018)' and 'Atari-Py library' and optimizers like 'Adam' and 'RMSProp', but it does not provide specific version numbers for any of these software components.
Experiment Setup	No	The paper states: 'We use the optimiser settings and other hyper-parameters found in Dopamine (Castro et al., 2018) as detailed in appendix C.2.' and 'These hyper-parameters are the result of careful tuning reported in the literature and are detailed in appendix C.' While hyperparameters are mentioned as being available, their specific values or detailed configurations are not presented within the main text provided.