TempoRL: Learning When to Act
Authors: André Biedenkapp, Raghu Rajan, Frank Hutter, Marius Lindauer
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated TEMPORL with tabular as well as deep Qfunctions. We first give results for the tabular case. All code, the appendix and experiment data including trained policies are available at github.com/automl/Tempo RL. |
| Researcher Affiliation | Collaboration | Andre Biedenkapp 1 Raghu Rajan 1 Frank Hutter 1 2 Marius Lindauer 3 1Department of Computer Science, University of Freiburg, Germany 2BCAI, Renningen, Germany 3Information Processing Institute (tnt), Leibniz University Hannover, Germany. |
| Pseudocode | Yes | For pseudo-code and more details we refer to Appendix B. |
| Open Source Code | Yes | The appendix, code and experiment results are available at github.com/automl/Tempo RL. |
| Open Datasets | Yes | Setup We chose to first evaluate on Open AI gyms (Brockman et al., 2016) Pendulum-v0 as it is an adversarial setting... We used the Mountain Car-v0 and Lunar Lander-v2 environments... We trained all agents on the games BEAMRIDER, FREEWAY, MSPACMAN, PONG and QBERT. |
| Dataset Splits | No | The paper describes evaluating agents during training steps but does not specify explicit training/validation/test dataset splits with percentages or counts for a fixed dataset, as it operates in reinforcement learning environments rather than on static datasets. |
| Hardware Specification | Yes | For details on the used hardware see Appendix C. |
| Software Dependencies | Yes | implemented with Py Torch (Paszke et al., 2019) in version 1.4.0. |
| Experiment Setup | Yes | We trained all agents for a total of 10^6 training steps using a constant ε-greedy exploration schedule with ε set to 0.1. We evaluated all agents every 200 training steps. We used the Adam with a learning rate of 10^-3 and default parameters as given in PyTorch v1.4.0. All agents used a replay buffer with size 10^6 and a discount factor γ of 0.99. |