reproducibilityindex.ai

Learning Routines for Effective Off-Policy Reinforcement Learning

Authors: Edoardo Cetin, Oya Celiktutan

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate that utilizing our proposed routine framework improves the performance of two different off-policy reinforcement learning algorithms tested on the environments from the Deep Mind Control Suite (Tassa et al., 2018). Moreover, using our framework, agents need to reason only after experiencing the outcome of each routine rather than each action. Therefore, they are able to query their policy much more infrequently by learning to perform longer routines from states that do not require a ﬁne level of control. Practically, this enables for computationally efﬁcient deployment, faster data-collection, and easier real-time inference (Dulac-Arnold et al., 2019).
Researcher Affiliation	Academia	1Centre for Robotics Research, Department of Engineering, King s College London. Correspondence to: Edoardo Cetin <edoardo.cetin@kcl.ac.uk>.
Pseudocode	Yes	We provide pseudocode in Section A of the Appendix.
Open Source Code	Yes	For access to our open-source implementations, please visit sites.google.com/view/routines-rl/.
Open Datasets	Yes	In this section, we provide an evaluation of the proposed routine framework utilizing the Deep Mind Control Suite (Tassa et al., 2018).
Dataset Splits	No	The paper describes training and evaluation epochs, but does not explicitly mention a 'validation set' or specific numerical splits (e.g., percentages or sample counts) for training, validation, and testing.
Hardware Specification	No	The paper discusses computational efficiency but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions algorithms like TD3 and SAC, and the Deep Mind Control Suite, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For these experiments, we ﬁx the maximum routine length to L = 4. We provide all other hyper-parameters used by our algorithms in Section C of the Appendix.