Automatic Curriculum Graph Generation for Reinforcement Learning Agents

Authors: Maxwell Svetlik, Matteo Leonetti, Jivko Sinapov, Rishi Shah, Nick Walker, Peter Stone

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in both discrete and continuous domains show that our method produces curricula that improve the agent s learning performance when compared to the baseline condition of learning on the target task from scratch.
Researcher Affiliation Academia Maxwell Svetlik University of Texas at Austin Austin, USA Matteo Leonetti University of Leeds Leeds, UK Jivko Sinapov University of Texas at Austin Austin, USA Rishi Shah University of Texas at Austin Austin, USA Nick Walker University of Texas at Austin Austin, USA Peter Stone University of Texas at Austin Austin, USA
Pseudocode Yes Algorithm 1 Curriculum learning; Algorithm 2 Intra-group transfer; Algorithm 3 Inter-group transfer; Algorithm 4 Curriculum Graph Generation
Open Source Code Yes All source code used in our evaluation is available at https://github.com/LARG/curriculum learning aaai2017
Open Datasets No The paper describes the use of 'Gridworld', 'Block Dude', and 'Ms. Pac-Man' domains and generating task variants within them. However, it does not provide concrete access information (link, DOI, formal citation) to a publicly available or open dataset of the generated tasks themselves.
Dataset Splits No The paper does not specify any training/validation/test dataset splits. It mentions learning until certain convergence criteria are met.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments (e.g., specific GPU/CPU models, memory, or cloud instances).
Software Dependencies No The paper mentions software like 'BURLAP (Mac Glashan 2016)' and algorithms such as 'Q-learning' and 'SARSA'. However, it does not provide specific version numbers for any software components or libraries, which are required for a reproducible description.
Experiment Setup Yes Each episode also ends after 200 steps. ... Each task in a curriculum was learned until the expected reward per episode stopped changing for 10 consecutive games. ... The game ends after the exit is reached, the agent gets trapped (...), or 200 actions have been taken. ... a source task was learned until the agent obtained at least a quarter of the maximum possible reward for that particular task.