Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective
Authors: Florin Gogianu, Tudor Berariu, Mihaela C Rosca, Claudia Clopath, Lucian Busoniu, Razvan Pascanu
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct ablation studies to disentangle the various effects normalisation has on the learning dynamics and show that is sufficient to modulate the parameter updates to recover most of the performance of spectral normalisation. These findings hint towards the need to also focus on the neural component and its learning dynamics to tackle the peculiarities of Deep Reinforcement Learning. |
| Researcher Affiliation | Collaboration | 1Bitdefender, Bucharest, Romania 2Department of Automation, Technical University of Cluj-Napoca, Romania 3Imperial College London, Department of Bioengineering, London, UK 4Deep Mind, London, UK 5Centre for Artificial Intelligence, University College London, London, UK. |
| Pseudocode | Yes | The algorithm is sketched below with u and v being the right, and left singural vectors. v Wu(t 1); α v ; v(t) α 1v u W v(t); ρ u ; u(t) ρ 1u |
| Open Source Code | Yes | We additionally provide code to ensure reproducibility.1 https://github.com/floringogianu/snrl |
| Open Datasets | Yes | We evaluate SN on the Arcade Learning Environment (ALE) (Bellemare et al., 2013). This collection of Atari games is varied and complex enough to ensure the generality of our claims and observations. ... all the experiments in this section if not mentioned otherwise are using for evaluation the Min Atar environment (Young & Tian, 2019) |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test splits, exact percentages, or sample counts. It refers to an evaluation protocol, but the specifics of data partitioning are not detailed in the provided text. |
| Hardware Specification | No | The paper does not contain any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cloud instance specifications. |
| Software Dependencies | No | The paper mentions software like 'Dopamine (Castro et al., 2018)' and 'Atari-Py library' and optimizers like 'Adam' and 'RMSProp', but it does not provide specific version numbers for any of these software components. |
| Experiment Setup | No | The paper states: 'We use the optimiser settings and other hyper-parameters found in Dopamine (Castro et al., 2018) as detailed in appendix C.2.' and 'These hyper-parameters are the result of careful tuning reported in the literature and are detailed in appendix C.' While hyperparameters are mentioned as being available, their specific values or detailed configurations are not presented within the main text provided. |