Information-theoretic Task Selection for Meta-Reinforcement Learning

Authors: Ricardo Luna Gutierrez, Matteo Leonetti

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We reproduce different meta-RL experiments from the literature and show that ITTS improves the final performance in all of them. [...] 5 Experimental Evaluation The main aim of this evaluation is twofold: to demonstrate that task selection is indeed beneficial for meta-RL, and show that applying ITTS to existing meta-RL algorithms consistently results in better performance on test tasks.
Researcher Affiliation Academia Ricardo Luna Gutierrez School of Computing University of Leeds Leeds, UK scrlg@leeds.ac.uk Matteo Leonetti School of Computing University of Leeds Leeds, UK M.Leonetti@leeds.ac.uk
Pseudocode Yes Algorithm 1 Information-Theoretic Task Selection [...] Algorithm 2 Relevance Evaluation
Open Source Code Yes All the parameters and implementation details for every experiment are available in the supplementary material, as well as the source code. For training individual tasks and meta-RL agents, garage [10] was used.
Open Datasets Yes Cart Pole, from Open AI gym [3], is a classic control task [...] Mini Grid is an open-source grid world package proposed as an RL benchmark [5].
Dataset Splits Yes In every domain we used K = 5 validation tasks.
Hardware Specification Yes We limited the number of training tasks in each domain so that the generation and training until convergence repeated for 5 times would not exceed 72 hours of computation on an 8-core machine at 1.8GHz and 32GB of RAM.
Software Dependencies No The paper mentions 'garage [10] was used' but does not provide a specific version number for this toolkit or any other software dependencies.
Experiment Setup Yes The threshold determines when a task is considered different enough from another task, that is, their difference measured as in Equation 1 is greater than or equal to ϵ. [...] All the parameters and implementation details for every experiment are available in the supplementary material, as well as the source code. [...] 20 rollouts per gradient were used.