Learning Routines for Effective Off-Policy Reinforcement Learning
Authors: Edoardo Cetin, Oya Celiktutan
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that utilizing our proposed routine framework improves the performance of two different off-policy reinforcement learning algorithms tested on the environments from the Deep Mind Control Suite (Tassa et al., 2018). Moreover, using our framework, agents need to reason only after experiencing the outcome of each routine rather than each action. Therefore, they are able to query their policy much more infrequently by learning to perform longer routines from states that do not require a fine level of control. Practically, this enables for computationally efficient deployment, faster data-collection, and easier real-time inference (Dulac-Arnold et al., 2019). |
| Researcher Affiliation | Academia | 1Centre for Robotics Research, Department of Engineering, King s College London. Correspondence to: Edoardo Cetin <edoardo.cetin@kcl.ac.uk>. |
| Pseudocode | Yes | We provide pseudocode in Section A of the Appendix. |
| Open Source Code | Yes | For access to our open-source implementations, please visit sites.google.com/view/routines-rl/. |
| Open Datasets | Yes | In this section, we provide an evaluation of the proposed routine framework utilizing the Deep Mind Control Suite (Tassa et al., 2018). |
| Dataset Splits | No | The paper describes training and evaluation epochs, but does not explicitly mention a 'validation set' or specific numerical splits (e.g., percentages or sample counts) for training, validation, and testing. |
| Hardware Specification | No | The paper discusses computational efficiency but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms like TD3 and SAC, and the Deep Mind Control Suite, but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | For these experiments, we fix the maximum routine length to L = 4. We provide all other hyper-parameters used by our algorithms in Section C of the Appendix. |