Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision
Authors: Johan Björck, Xiangyu Chen, Christopher De Sa, Carla P Gomes, Kilian Weinberger
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally verify that with these methods it is possible to train SAC agents in low precision with comparable performance to full-precision agents, thus demonstrating the feasibility of low-precision RL. We further benchmark compute time and memory consumption and find dramatic improvements in both aspects. We perform experiments on RL from both environment state representations and pixel observations, and also perform ablation experiments and simulate various numerical formats with qtorch (Zhang et al., 2019). |
| Researcher Affiliation | Academia | 1Department of Computer Science, Cornell University, USA. Correspondence to: Johan Bjorck <njb225@cornell.edu>. |
| Pseudocode | Yes | Algorithm 1 h Adam. |
| Open Source Code | Yes | Finally, we release our code to encourage future work on RL in low precision. |
| Open Datasets | Yes | For environments, we consider the planet benchmark, popularized by (Hafner et al., 2019) and used in e.g. Kostrikov et al. (2020); Laskin et al. (2020b;a). It consists of six continuous control tasks from the deep mind control suite (Tassa et al., 2020): finger spin, cartpole swingup, reacher easy, cheetah run, walker walk, and ball in cup catch. |
| Dataset Splits | No | The paper does not specify the use of a distinct validation dataset split for hyperparameter tuning. It states, 'We do not tune hyperparameters, but instead use hyperparameters from Yarats & Kostrikov (2020) (listed in Appendix B).' |
| Hardware Specification | Yes | The time is averaged over 500 gradient updates (and occasional momentum updates for the target network) with a warm started Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions 'PyTorch' and 'qtorch' but does not provide specific version numbers for these or any other software components. |
| Experiment Setup | Yes | We do not tune hyperparameters, but instead use hyperparameters from Yarats & Kostrikov (2020) (listed in Appendix B). |