Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Paced Context Evaluation for Contextual Reinforcement Learning

Authors: Theresa Eimer, André Biedenkapp, Frank Hutter, Marius Lindauer

ICML 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we empirically evaluate SPACE on two different environments. The code for all experiments is available at https://github.com/automl/SPa CE. We ﬁrst describe the experimental setup before comparing SPACE against a round robin (RR) training scheme and SPDRL (Klink et al., 2020) as a state-of-the-art self-paced RL baseline. Finally we evaluate the inﬂuence of SPACE s own hyperparameters and limitations.
Researcher Affiliation	Collaboration	1Information Processing Institute (tnt), Leibniz University Hannover, Germany 2Department of Computer Science, University of Freiburg, Germany 3Bosch Center for Artiﬁcial Intelligence, Renningen, Germany.
Pseudocode	Yes	Algorithm 1 summarizes the idea of SPACE.
Open Source Code	Yes	The code for all experiments is available at https://github.com/automl/SPa CE.
Open Datasets	Yes	We evaluated SPACE in settings that readily allow for context information to encode different instances, namely the Ant locomotion environment (Coumans & Bai, 2020), the gym-maze environment (Chan, 2019) and the Ball Catching and contextual Point Mass environments as used by Klink et al. (2020).
Dataset Splits	No	The paper mentions splitting data into "training and test sets" but does not explicitly state a separate "validation" set or specific percentages for such a split.
Hardware Specification	No	For hardware speciﬁcations and hyperparameters, please see Appendix B. (Appendix B is not provided in the given text.)
Software Dependencies	No	The paper mentions environments like "gym-maze" and "Pybullet" and agents like "PPO" and "TRPO" but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	SPACE comes with two hyperparameters, the performance threshold for curriculum interactions η and the instance increment κ. These hyperparameters interact with each other to make SPACE comparatively stable across different hyperparameter values (as seen in Figure 1). ... Our study shows very little performance differences for different values of κ and η. ... Table 1: Mean reward standard deviation for different hyperparameter values on Point Mass after 106 steps.