Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision

Authors: Johan Björck, Xiangyu Chen, Christopher De Sa, Carla P Gomes, Kilian Weinberger

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally verify that with these methods it is possible to train SAC agents in low precision with comparable performance to full-precision agents, thus demonstrating the feasibility of low-precision RL. We further benchmark compute time and memory consumption and find dramatic improvements in both aspects. We perform experiments on RL from both environment state representations and pixel observations, and also perform ablation experiments and simulate various numerical formats with qtorch (Zhang et al., 2019).
Researcher Affiliation Academia 1Department of Computer Science, Cornell University, USA. Correspondence to: Johan Bjorck <njb225@cornell.edu>.
Pseudocode Yes Algorithm 1 h Adam.
Open Source Code Yes Finally, we release our code to encourage future work on RL in low precision.
Open Datasets Yes For environments, we consider the planet benchmark, popularized by (Hafner et al., 2019) and used in e.g. Kostrikov et al. (2020); Laskin et al. (2020b;a). It consists of six continuous control tasks from the deep mind control suite (Tassa et al., 2020): finger spin, cartpole swingup, reacher easy, cheetah run, walker walk, and ball in cup catch.
Dataset Splits No The paper does not specify the use of a distinct validation dataset split for hyperparameter tuning. It states, 'We do not tune hyperparameters, but instead use hyperparameters from Yarats & Kostrikov (2020) (listed in Appendix B).'
Hardware Specification Yes The time is averaged over 500 gradient updates (and occasional momentum updates for the target network) with a warm started Tesla V100 GPU.
Software Dependencies No The paper mentions 'PyTorch' and 'qtorch' but does not provide specific version numbers for these or any other software components.
Experiment Setup Yes We do not tune hyperparameters, but instead use hyperparameters from Yarats & Kostrikov (2020) (listed in Appendix B).