Learning Routines for Effective Off-Policy Reinforcement Learning

Authors: Edoardo Cetin, Oya Celiktutan

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that utilizing our proposed routine framework improves the performance of two different off-policy reinforcement learning algorithms tested on the environments from the Deep Mind Control Suite (Tassa et al., 2018). Moreover, using our framework, agents need to reason only after experiencing the outcome of each routine rather than each action. Therefore, they are able to query their policy much more infrequently by learning to perform longer routines from states that do not require a fine level of control. Practically, this enables for computationally efficient deployment, faster data-collection, and easier real-time inference (Dulac-Arnold et al., 2019).
Researcher Affiliation Academia 1Centre for Robotics Research, Department of Engineering, King s College London. Correspondence to: Edoardo Cetin <edoardo.cetin@kcl.ac.uk>.
Pseudocode Yes We provide pseudocode in Section A of the Appendix.
Open Source Code Yes For access to our open-source implementations, please visit sites.google.com/view/routines-rl/.
Open Datasets Yes In this section, we provide an evaluation of the proposed routine framework utilizing the Deep Mind Control Suite (Tassa et al., 2018).
Dataset Splits No The paper describes training and evaluation epochs, but does not explicitly mention a 'validation set' or specific numerical splits (e.g., percentages or sample counts) for training, validation, and testing.
Hardware Specification No The paper discusses computational efficiency but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions algorithms like TD3 and SAC, and the Deep Mind Control Suite, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes For these experiments, we fix the maximum routine length to L = 4. We provide all other hyper-parameters used by our algorithms in Section C of the Appendix.