reproducibilityindex.ai

Boosted Curriculum Reinforcement Learning

Authors: Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide detailed empirical evidence of the beneﬁts of BCRL in problems requiring curricula for accurate action-value estimation and targeted exploration.
Researcher Affiliation	Academia	Pascal Klink , Carlo D Eramo, Jan Peters Department of Computer Science TU Darmstadt, Germany Joni Pajarinen Department of Electrical Engineering and Automation Aalto University, Finland
Pseudocode	Yes	Algorithm 1: Boosted Curriculum Reinforcement Learning (BCRL)
Open Source Code	Yes	Code for reproducing results available under: https://github.com/psclklnk/boosted_crl.
Open Datasets	No	The paper uses custom-generated datasets for its experiments (e.g., "The dataset for learning the Q-function in each task is generated by executing 1000 trajectories with a random policy." and "we collect a dataset for each task running 500 episodes"), but does not provide specific access information (link, DOI, formal citation) to make these generated datasets publicly available or open.
Dataset Splits	No	The paper describes data collection for learning Q-functions (e.g., "1000 trajectories with a random policy" or "500 episodes using ε-greedy exploration") but does not specify explicit training/validation/test splits (e.g., percentages or sample counts) of these datasets.
Hardware Specification	Yes	The computations were executed on a desktop computer with a Geforce RTX 2080, 64 GB memory and an AMD Ryzen 9 3900X processor.
Software Dependencies	No	The paper mentions using the "Mushroom RL library (D Eramo et al., 2021)" and "extra-trees", but it does not specify exact version numbers for these software components, which is required for reproducibility.
Experiment Setup	Yes	The number of trees in the extra-trees implementation was set to 50, the minimum number of samples required to split an internal node to 5, and the minimum number of samples required to be a leaf node to 2. The Q-networks employed in DQN consist of two hidden layers of 128 neurons and Re LU activations. ... we conducted a grid search testing values in [25, 50, 100, 200, 500, 1000, 2000, 4000]. ...We used Adam for optimization of the network with a learning rate of 10 4 and (β1, β2) = (0.9, 0.999). ...we used a ﬁxed value of ε = 0.1 in the target task. For default DQN, we annealed ε from 1 to 0.1 over the ﬁrst 1000 environment steps.