Boosted Curriculum Reinforcement Learning
Authors: Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide detailed empirical evidence of the benefits of BCRL in problems requiring curricula for accurate action-value estimation and targeted exploration. |
| Researcher Affiliation | Academia | Pascal Klink , Carlo D Eramo, Jan Peters Department of Computer Science TU Darmstadt, Germany Joni Pajarinen Department of Electrical Engineering and Automation Aalto University, Finland |
| Pseudocode | Yes | Algorithm 1: Boosted Curriculum Reinforcement Learning (BCRL) |
| Open Source Code | Yes | Code for reproducing results available under: https://github.com/psclklnk/boosted_crl. |
| Open Datasets | No | The paper uses custom-generated datasets for its experiments (e.g., "The dataset for learning the Q-function in each task is generated by executing 1000 trajectories with a random policy." and "we collect a dataset for each task running 500 episodes"), but does not provide specific access information (link, DOI, formal citation) to make these generated datasets publicly available or open. |
| Dataset Splits | No | The paper describes data collection for learning Q-functions (e.g., "1000 trajectories with a random policy" or "500 episodes using ε-greedy exploration") but does not specify explicit training/validation/test splits (e.g., percentages or sample counts) of these datasets. |
| Hardware Specification | Yes | The computations were executed on a desktop computer with a Geforce RTX 2080, 64 GB memory and an AMD Ryzen 9 3900X processor. |
| Software Dependencies | No | The paper mentions using the "Mushroom RL library (D Eramo et al., 2021)" and "extra-trees", but it does not specify exact version numbers for these software components, which is required for reproducibility. |
| Experiment Setup | Yes | The number of trees in the extra-trees implementation was set to 50, the minimum number of samples required to split an internal node to 5, and the minimum number of samples required to be a leaf node to 2. The Q-networks employed in DQN consist of two hidden layers of 128 neurons and Re LU activations. ... we conducted a grid search testing values in [25, 50, 100, 200, 500, 1000, 2000, 4000]. ...We used Adam for optimization of the network with a learning rate of 10 4 and (β1, β2) = (0.9, 0.999). ...we used a fixed value of ε = 0.1 in the target task. For default DQN, we annealed ε from 1 to 0.1 over the first 1000 environment steps. |