Towards Deeper Deep Reinforcement Learning with Spectral Normalization
Authors: Nils Bjorck, Carla P. Gomes, Kilian Q. Weinberger
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify that naïvely adopting such architectures leads to instabilities and poor performance, likely contributing to the popularity of simple models in practice. However, we show that dataset size is not the limiting factor, and instead argue that instability from taking gradients through the critic is the culprit. We demonstrate that spectral normalization (SN) can mitigate this issue and enable stable training with large modern architectures. After smoothing with SN, larger models yield significant performance improvements suggesting that more easy gains may be had by focusing on model architectures in addition to algorithmic innovations. |
| Researcher Affiliation | Academia | Johan Bjorck , Carla P. Gomes, Kilian Q. Weinberger Cornell University |
| Pseudocode | No | The paper describes algorithms and their components but does not include any explicit pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper mentions an 'excellent open-source codebase' in reference to the DRQ baseline, but does not provide a specific statement or link for the authors' own implementation of the spectral normalization method or their modifications. |
| Open Datasets | Yes | Specifically we evaluate on the Deep Mind control suite [65], which has been used in [26, 27, 35, 39]. We use the 15 tasks considered in Kostrikov et al. [35] and evaluate after 500,000 samples, which has become a common benchmark [35, 39, 41]. |
| Dataset Splits | No | The paper evaluates performance after a fixed number of samples/steps in a continuous control environment. It describes the training process and evaluation points (e.g., 500,000 samples) but does not specify traditional dataset splits (e.g., train/validation/test percentages or counts) for a static dataset. |
| Hardware Specification | Yes | We use Tesla V100 GPUs and measure time with CUDA events and measure memory with Py Torch native tools. |
| Software Dependencies | No | The paper mentions 'Py Torch native tools' but does not specify version numbers for PyTorch or any other software libraries or dependencies used for the experiments. |
| Experiment Setup | Yes | We will primarily focus on the image augmentation based SAC agent of Kostrikov et al. [35], known as DRQ, and adopt their hyperparameters (listed in Appendix B). This agent reaches state-of-the-art performance without any nonstandard bells-and-whistles, and has an excellent open-source codebase. Unless specifically mentioned, all figures and tables refer to this agent. |