reproducibilityindex.ai

Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision

Authors: Johan Björck, Xiangyu Chen, Christopher De Sa, Carla P Gomes, Kilian Weinberger

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally verify that with these methods it is possible to train SAC agents in low precision with comparable performance to full-precision agents, thus demonstrating the feasibility of low-precision RL. We further benchmark compute time and memory consumption and ﬁnd dramatic improvements in both aspects. We perform experiments on RL from both environment state representations and pixel observations, and also perform ablation experiments and simulate various numerical formats with qtorch (Zhang et al., 2019).
Researcher Affiliation	Academia	1Department of Computer Science, Cornell University, USA. Correspondence to: Johan Bjorck <njb225@cornell.edu>.
Pseudocode	Yes	Algorithm 1 h Adam.
Open Source Code	Yes	Finally, we release our code to encourage future work on RL in low precision.
Open Datasets	Yes	For environments, we consider the planet benchmark, popularized by (Hafner et al., 2019) and used in e.g. Kostrikov et al. (2020); Laskin et al. (2020b;a). It consists of six continuous control tasks from the deep mind control suite (Tassa et al., 2020): ﬁnger spin, cartpole swingup, reacher easy, cheetah run, walker walk, and ball in cup catch.
Dataset Splits	No	The paper does not specify the use of a distinct validation dataset split for hyperparameter tuning. It states, 'We do not tune hyperparameters, but instead use hyperparameters from Yarats & Kostrikov (2020) (listed in Appendix B).'
Hardware Specification	Yes	The time is averaged over 500 gradient updates (and occasional momentum updates for the target network) with a warm started Tesla V100 GPU.
Software Dependencies	No	The paper mentions 'PyTorch' and 'qtorch' but does not provide specific version numbers for these or any other software components.
Experiment Setup	Yes	We do not tune hyperparameters, but instead use hyperparameters from Yarats & Kostrikov (2020) (listed in Appendix B).