reproducibilityindex.ai

Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes

Authors: Guillermo Infante, Anders Jonsson, Vicenç Gómez6970-6977

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analyze experimentally our proposed learning algorithm and show in two classical domains that it is more sample efficient compared to a flat learner and similar hierarchical approaches when the set of boundary states is smaller than the entire state space.
Researcher Affiliation	Academia	Dept. Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona (Spain) {guillermo.infante,anders.jonsson,vicen.gomez}@upf.edu
Pseudocode	Yes	Algorithm Online and Intra-Task Learning Algorithm
Open Source Code	Yes	1Code available at https://github.com/guillermoim/HRL LMDP
Open Datasets	Yes	Rooms Domain. We analyze the performance for different room sizes and number of rooms (Figure 2). ... Taxi Domain. To allow comparison between all the methods, we adapted the Taxi domain as follows: when the taxi is at the correct pickup location, it can transition to a state with the passenger in the taxi. In a wrong pickup location, it can instead transition to a terminal state with large negative reward (simulating an unsuccessful pick-up). When the passenger is in the taxi, it can be dropped off at any pickup location, successfully completing the task whenever dropped at the correct destination.
Dataset Splits	No	The paper describes online learning in simulated environments (Rooms and Taxi domains) and evaluates performance based on episodes and samples, but does not specify explicit train/validation/test dataset splits with percentages or counts.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud instances) are provided for the experimental setup.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1').
Experiment Setup	Yes	In all experiments, the learning rates for each abstraction level is αℓ(t) = cℓ/(cℓ+ n) where n represents the episode each sample t belongs to. We empirically optimize the constant cℓfor each domain. For LMDPs, we use a temperature λ = 1, which provides good results.