Boosted Curriculum Reinforcement Learning

Authors: Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we provide detailed empirical evidence of the benefits of BCRL in problems requiring curricula for accurate action-value estimation and targeted exploration.
Researcher Affiliation Academia Pascal Klink , Carlo D Eramo, Jan Peters Department of Computer Science TU Darmstadt, Germany Joni Pajarinen Department of Electrical Engineering and Automation Aalto University, Finland
Pseudocode Yes Algorithm 1: Boosted Curriculum Reinforcement Learning (BCRL)
Open Source Code Yes Code for reproducing results available under: https://github.com/psclklnk/boosted_crl.
Open Datasets No The paper uses custom-generated datasets for its experiments (e.g., "The dataset for learning the Q-function in each task is generated by executing 1000 trajectories with a random policy." and "we collect a dataset for each task running 500 episodes"), but does not provide specific access information (link, DOI, formal citation) to make these generated datasets publicly available or open.
Dataset Splits No The paper describes data collection for learning Q-functions (e.g., "1000 trajectories with a random policy" or "500 episodes using ε-greedy exploration") but does not specify explicit training/validation/test splits (e.g., percentages or sample counts) of these datasets.
Hardware Specification Yes The computations were executed on a desktop computer with a Geforce RTX 2080, 64 GB memory and an AMD Ryzen 9 3900X processor.
Software Dependencies No The paper mentions using the "Mushroom RL library (D Eramo et al., 2021)" and "extra-trees", but it does not specify exact version numbers for these software components, which is required for reproducibility.
Experiment Setup Yes The number of trees in the extra-trees implementation was set to 50, the minimum number of samples required to split an internal node to 5, and the minimum number of samples required to be a leaf node to 2. The Q-networks employed in DQN consist of two hidden layers of 128 neurons and Re LU activations. ... we conducted a grid search testing values in [25, 50, 100, 200, 500, 1000, 2000, 4000]. ...We used Adam for optimization of the network with a learning rate of 10 4 and (β1, β2) = (0.9, 0.999). ...we used a fixed value of ε = 0.1 in the target task. For default DQN, we annealed ε from 1 to 0.1 over the first 1000 environment steps.