TempLe: Learning Template of Transitions for Sample Efficient Multi-task RL
Authors: Yanchao Sun, Xiangyu Yin, Furong Huang9765-9773
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate empirical results to show O-Temp Le and FM-Temp Le outperform existing state-of-the-art algorithms both in the finite-model setting and in the more realistic online setting. |
| Researcher Affiliation | Academia | 1 University of Maryland, College Park, MD, 20742 2 Beijing University of Posts and Telecommunications, China ycs@umd.edu, yinxiangyu@bupt.edu.cn, furongh@umd.edu |
| Pseudocode | Yes | Algorithm 1 Online Template Learning (O-Temp Le) and Algorithm 2 TT Functions |
| Open Source Code | Yes | Our code is available at https://github.com/umd-huang-lab/templatereinforcement-learning. |
| Open Datasets | No | The paper describes generating environments (e.g., maze tasks) and sampling tasks from them, but does not provide concrete access information or citations for a publicly available or open dataset. |
| Dataset Splits | No | The paper discusses reinforcement learning tasks and environments, and does not specify train/validation/test dataset splits needed for reproducibility. It describes generating tasks or drawing them from a set of MDPs. |
| Hardware Specification | No | The paper does not provide specific hardware details (like exact GPU/CPU models or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using RMax and Q-learning as base learners or baselines, but does not provide specific version numbers for any ancillary software dependencies or libraries (e.g., Python version, deep learning framework versions). |
| Experiment Setup | Yes | Input: user-specified TT gap ˆτ; error tolerance ϵ; discount factor γ; regular known threshold m; small known threshold ms. Environment. For the more realistic Online MTRL which allows the number of MDP models to be extremely large, we generalize the traditional maze environment to have arbitrary combinations of landforms, as shown in Figure 1. We use 3 types of landforms, sand, marble and ice, respectively with slipping probabilities 0, 0.2, and 0.4. We test various hyper-parameters to understand how significantly the performance of the algorithms could be affected by inaccurate guesses of ˆτ and Γ, shown in Figure 3. |