reproducibilityindex.ai

Towards Deeper Deep Reinforcement Learning with Spectral Normalization

Authors: Nils Bjorck, Carla P. Gomes, Kilian Q. Weinberger

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically verify that naïvely adopting such architectures leads to instabilities and poor performance, likely contributing to the popularity of simple models in practice. However, we show that dataset size is not the limiting factor, and instead argue that instability from taking gradients through the critic is the culprit. We demonstrate that spectral normalization (SN) can mitigate this issue and enable stable training with large modern architectures. After smoothing with SN, larger models yield signiﬁcant performance improvements suggesting that more easy gains may be had by focusing on model architectures in addition to algorithmic innovations.
Researcher Affiliation	Academia	Johan Bjorck , Carla P. Gomes, Kilian Q. Weinberger Cornell University
Pseudocode	No	The paper describes algorithms and their components but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code	No	The paper mentions an 'excellent open-source codebase' in reference to the DRQ baseline, but does not provide a specific statement or link for the authors' own implementation of the spectral normalization method or their modifications.
Open Datasets	Yes	Speciﬁcally we evaluate on the Deep Mind control suite [65], which has been used in [26, 27, 35, 39]. We use the 15 tasks considered in Kostrikov et al. [35] and evaluate after 500,000 samples, which has become a common benchmark [35, 39, 41].
Dataset Splits	No	The paper evaluates performance after a fixed number of samples/steps in a continuous control environment. It describes the training process and evaluation points (e.g., 500,000 samples) but does not specify traditional dataset splits (e.g., train/validation/test percentages or counts) for a static dataset.
Hardware Specification	Yes	We use Tesla V100 GPUs and measure time with CUDA events and measure memory with Py Torch native tools.
Software Dependencies	No	The paper mentions 'Py Torch native tools' but does not specify version numbers for PyTorch or any other software libraries or dependencies used for the experiments.
Experiment Setup	Yes	We will primarily focus on the image augmentation based SAC agent of Kostrikov et al. [35], known as DRQ, and adopt their hyperparameters (listed in Appendix B). This agent reaches state-of-the-art performance without any nonstandard bells-and-whistles, and has an excellent open-source codebase. Unless speciﬁcally mentioned, all ﬁgures and tables refer to this agent.