reproducibilityindex.ai

Is High Variance Unavoidable in RL? A Case Study in Continuous Control

Authors: Johan Bjorck, Carla P Gomes, Kilian Q Weinberger

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we investigate causes for this perceived instability. To allow for an in-depth analysis, we focus on a speciﬁcally popular setup with high variance continuous control from pixels with an actor-critic agent. In this setting, we demonstrate that poor outlier runs which completely fail to learn are an important source of variance, but that weight initialization and initial exploration are not at fault. We show that one cause for these outliers is unstable network parametrization which leads to saturating nonlinearities. We investigate several ﬁxes to this issue and ﬁnd that simply normalizing penultimate features is surprisingly effective. For sparse tasks, we also ﬁnd that partially disabling clipped double Q-learning decreases variance. By combining ﬁxes we signiﬁcantly decrease variances, lowering the average standard deviation across 21 tasks by a factor > 3 for a state-of-the-art agent.
Researcher Affiliation	Academia	Johan Bjorck, Carla P. Gomes, Kilian Q. Weinberger Cornell University
Pseudocode	No	The paper describes algorithms (DDPG, DRQv2) and their modifications in prose, but does not include any pseudocode or algorithm blocks.
Open Source Code	No	Our experiments are based upon the open-source DRQv2 implementation of Yarats et al. (2021b). This states they used an open-source implementation, not that their own code for their modifications is open-source or available.
Open Datasets	Yes	We consider the standard continuous control benchmark deepmind control (dm-control) (Tassa et al., 2020).
Dataset Splits	Yes	For each run, we train the agent for one million frames, or equivalently 1,000 episodes, and evaluate over ten episodes.
Hardware Specification	Yes	We run our experiments on Nvidia Tesla V100 GPUs and Intel Xeon CPUs.
Software Dependencies	Yes	The GPUs use CUDA 11.1 and CUDNN 8.0.0.5. We use Py Torch 1.9.0 and python 3.8.10.
Experiment Setup	Yes	We use the default hyperparameters that Yarats et al. (2021b) uses on the medium benchmark (listed in Appendix A) throughout the paper. Table 5: Hyperparameters used throughout the paper. These follow Yarats et al. (2021b) for the medium tasks.