reproducibilityindex.ai

TempoRL: Learning When to Act

Authors: André Biedenkapp, Raghu Rajan, Frank Hutter, Marius Lindauer

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated TEMPORL with tabular as well as deep Qfunctions. We first give results for the tabular case. All code, the appendix and experiment data including trained policies are available at github.com/automl/Tempo RL.
Researcher Affiliation	Collaboration	Andre Biedenkapp 1 Raghu Rajan 1 Frank Hutter 1 2 Marius Lindauer 3 1Department of Computer Science, University of Freiburg, Germany 2BCAI, Renningen, Germany 3Information Processing Institute (tnt), Leibniz University Hannover, Germany.
Pseudocode	Yes	For pseudo-code and more details we refer to Appendix B.
Open Source Code	Yes	The appendix, code and experiment results are available at github.com/automl/Tempo RL.
Open Datasets	Yes	Setup We chose to first evaluate on Open AI gyms (Brockman et al., 2016) Pendulum-v0 as it is an adversarial setting... We used the Mountain Car-v0 and Lunar Lander-v2 environments... We trained all agents on the games BEAMRIDER, FREEWAY, MSPACMAN, PONG and QBERT.
Dataset Splits	No	The paper describes evaluating agents during training steps but does not specify explicit training/validation/test dataset splits with percentages or counts for a fixed dataset, as it operates in reinforcement learning environments rather than on static datasets.
Hardware Specification	Yes	For details on the used hardware see Appendix C.
Software Dependencies	Yes	implemented with Py Torch (Paszke et al., 2019) in version 1.4.0.
Experiment Setup	Yes	We trained all agents for a total of 10^6 training steps using a constant ε-greedy exploration schedule with ε set to 0.1. We evaluated all agents every 200 training steps. We used the Adam with a learning rate of 10^-3 and default parameters as given in PyTorch v1.4.0. All agents used a replay buffer with size 10^6 and a discount factor γ of 0.99.