Information-theoretic Task Selection for Meta-Reinforcement Learning
Authors: Ricardo Luna Gutierrez, Matteo Leonetti
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We reproduce different meta-RL experiments from the literature and show that ITTS improves the final performance in all of them. [...] 5 Experimental Evaluation The main aim of this evaluation is twofold: to demonstrate that task selection is indeed beneficial for meta-RL, and show that applying ITTS to existing meta-RL algorithms consistently results in better performance on test tasks. |
| Researcher Affiliation | Academia | Ricardo Luna Gutierrez School of Computing University of Leeds Leeds, UK scrlg@leeds.ac.uk Matteo Leonetti School of Computing University of Leeds Leeds, UK M.Leonetti@leeds.ac.uk |
| Pseudocode | Yes | Algorithm 1 Information-Theoretic Task Selection [...] Algorithm 2 Relevance Evaluation |
| Open Source Code | Yes | All the parameters and implementation details for every experiment are available in the supplementary material, as well as the source code. For training individual tasks and meta-RL agents, garage [10] was used. |
| Open Datasets | Yes | Cart Pole, from Open AI gym [3], is a classic control task [...] Mini Grid is an open-source grid world package proposed as an RL benchmark [5]. |
| Dataset Splits | Yes | In every domain we used K = 5 validation tasks. |
| Hardware Specification | Yes | We limited the number of training tasks in each domain so that the generation and training until convergence repeated for 5 times would not exceed 72 hours of computation on an 8-core machine at 1.8GHz and 32GB of RAM. |
| Software Dependencies | No | The paper mentions 'garage [10] was used' but does not provide a specific version number for this toolkit or any other software dependencies. |
| Experiment Setup | Yes | The threshold determines when a task is considered different enough from another task, that is, their difference measured as in Equation 1 is greater than or equal to ϵ. [...] All the parameters and implementation details for every experiment are available in the supplementary material, as well as the source code. [...] 20 rollouts per gradient were used. |