Towards Deeper Deep Reinforcement Learning with Spectral Normalization

Authors: Nils Bjorck, Carla P. Gomes, Kilian Q. Weinberger

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically verify that naïvely adopting such architectures leads to instabilities and poor performance, likely contributing to the popularity of simple models in practice. However, we show that dataset size is not the limiting factor, and instead argue that instability from taking gradients through the critic is the culprit. We demonstrate that spectral normalization (SN) can mitigate this issue and enable stable training with large modern architectures. After smoothing with SN, larger models yield significant performance improvements suggesting that more easy gains may be had by focusing on model architectures in addition to algorithmic innovations.
Researcher Affiliation Academia Johan Bjorck , Carla P. Gomes, Kilian Q. Weinberger Cornell University
Pseudocode No The paper describes algorithms and their components but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code No The paper mentions an 'excellent open-source codebase' in reference to the DRQ baseline, but does not provide a specific statement or link for the authors' own implementation of the spectral normalization method or their modifications.
Open Datasets Yes Specifically we evaluate on the Deep Mind control suite [65], which has been used in [26, 27, 35, 39]. We use the 15 tasks considered in Kostrikov et al. [35] and evaluate after 500,000 samples, which has become a common benchmark [35, 39, 41].
Dataset Splits No The paper evaluates performance after a fixed number of samples/steps in a continuous control environment. It describes the training process and evaluation points (e.g., 500,000 samples) but does not specify traditional dataset splits (e.g., train/validation/test percentages or counts) for a static dataset.
Hardware Specification Yes We use Tesla V100 GPUs and measure time with CUDA events and measure memory with Py Torch native tools.
Software Dependencies No The paper mentions 'Py Torch native tools' but does not specify version numbers for PyTorch or any other software libraries or dependencies used for the experiments.
Experiment Setup Yes We will primarily focus on the image augmentation based SAC agent of Kostrikov et al. [35], known as DRQ, and adopt their hyperparameters (listed in Appendix B). This agent reaches state-of-the-art performance without any nonstandard bells-and-whistles, and has an excellent open-source codebase. Unless specifically mentioned, all figures and tables refer to this agent.