TempoRL: Learning When to Act

Authors: André Biedenkapp, Raghu Rajan, Frank Hutter, Marius Lindauer

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated TEMPORL with tabular as well as deep Qfunctions. We first give results for the tabular case. All code, the appendix and experiment data including trained policies are available at github.com/automl/Tempo RL.
Researcher Affiliation Collaboration Andre Biedenkapp 1 Raghu Rajan 1 Frank Hutter 1 2 Marius Lindauer 3 1Department of Computer Science, University of Freiburg, Germany 2BCAI, Renningen, Germany 3Information Processing Institute (tnt), Leibniz University Hannover, Germany.
Pseudocode Yes For pseudo-code and more details we refer to Appendix B.
Open Source Code Yes The appendix, code and experiment results are available at github.com/automl/Tempo RL.
Open Datasets Yes Setup We chose to first evaluate on Open AI gyms (Brockman et al., 2016) Pendulum-v0 as it is an adversarial setting... We used the Mountain Car-v0 and Lunar Lander-v2 environments... We trained all agents on the games BEAMRIDER, FREEWAY, MSPACMAN, PONG and QBERT.
Dataset Splits No The paper describes evaluating agents during training steps but does not specify explicit training/validation/test dataset splits with percentages or counts for a fixed dataset, as it operates in reinforcement learning environments rather than on static datasets.
Hardware Specification Yes For details on the used hardware see Appendix C.
Software Dependencies Yes implemented with Py Torch (Paszke et al., 2019) in version 1.4.0.
Experiment Setup Yes We trained all agents for a total of 10^6 training steps using a constant ε-greedy exploration schedule with ε set to 0.1. We evaluated all agents every 200 training steps. We used the Adam with a learning rate of 10^-3 and default parameters as given in PyTorch v1.4.0. All agents used a replay buffer with size 10^6 and a discount factor γ of 0.99.