Self-Paced Context Evaluation for Contextual Reinforcement Learning
Authors: Theresa Eimer, André Biedenkapp, Frank Hutter, Marius Lindauer
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we empirically evaluate SPACE on two different environments. The code for all experiments is available at https://github.com/automl/SPa CE. We first describe the experimental setup before comparing SPACE against a round robin (RR) training scheme and SPDRL (Klink et al., 2020) as a state-of-the-art self-paced RL baseline. Finally we evaluate the influence of SPACE s own hyperparameters and limitations. |
| Researcher Affiliation | Collaboration | 1Information Processing Institute (tnt), Leibniz University Hannover, Germany 2Department of Computer Science, University of Freiburg, Germany 3Bosch Center for Artificial Intelligence, Renningen, Germany. |
| Pseudocode | Yes | Algorithm 1 summarizes the idea of SPACE. |
| Open Source Code | Yes | The code for all experiments is available at https://github.com/automl/SPa CE. |
| Open Datasets | Yes | We evaluated SPACE in settings that readily allow for context information to encode different instances, namely the Ant locomotion environment (Coumans & Bai, 2020), the gym-maze environment (Chan, 2019) and the Ball Catching and contextual Point Mass environments as used by Klink et al. (2020). |
| Dataset Splits | No | The paper mentions splitting data into "training and test sets" but does not explicitly state a separate "validation" set or specific percentages for such a split. |
| Hardware Specification | No | For hardware specifications and hyperparameters, please see Appendix B. (Appendix B is not provided in the given text.) |
| Software Dependencies | No | The paper mentions environments like "gym-maze" and "Pybullet" and agents like "PPO" and "TRPO" but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | SPACE comes with two hyperparameters, the performance threshold for curriculum interactions η and the instance increment κ. These hyperparameters interact with each other to make SPACE comparatively stable across different hyperparameter values (as seen in Figure 1). ... Our study shows very little performance differences for different values of κ and η. ... Table 1: Mean reward standard deviation for different hyperparameter values on Point Mass after 106 steps. |