reproducibilityindex.ai

TempLe: Learning Template of Transitions for Sample Efficient Multi-task RL

Authors: Yanchao Sun, Xiangyu Yin, Furong Huang9765-9773

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we demonstrate empirical results to show O-Temp Le and FM-Temp Le outperform existing state-of-the-art algorithms both in the ﬁnite-model setting and in the more realistic online setting.
Researcher Affiliation	Academia	1 University of Maryland, College Park, MD, 20742 2 Beijing University of Posts and Telecommunications, China ycs@umd.edu, yinxiangyu@bupt.edu.cn, furongh@umd.edu
Pseudocode	Yes	Algorithm 1 Online Template Learning (O-Temp Le) and Algorithm 2 TT Functions
Open Source Code	Yes	Our code is available at https://github.com/umd-huang-lab/templatereinforcement-learning.
Open Datasets	No	The paper describes generating environments (e.g., maze tasks) and sampling tasks from them, but does not provide concrete access information or citations for a publicly available or open dataset.
Dataset Splits	No	The paper discusses reinforcement learning tasks and environments, and does not specify train/validation/test dataset splits needed for reproducibility. It describes generating tasks or drawing them from a set of MDPs.
Hardware Specification	No	The paper does not provide specific hardware details (like exact GPU/CPU models or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper mentions using RMax and Q-learning as base learners or baselines, but does not provide specific version numbers for any ancillary software dependencies or libraries (e.g., Python version, deep learning framework versions).
Experiment Setup	Yes	Input: user-speciﬁed TT gap ˆτ; error tolerance ϵ; discount factor γ; regular known threshold m; small known threshold ms. Environment. For the more realistic Online MTRL which allows the number of MDP models to be extremely large, we generalize the traditional maze environment to have arbitrary combinations of landforms, as shown in Figure 1. We use 3 types of landforms, sand, marble and ice, respectively with slipping probabilities 0, 0.2, and 0.4. We test various hyper-parameters to understand how signiﬁcantly the performance of the algorithms could be affected by inaccurate guesses of ˆτ and Γ, shown in Figure 3.