reproducibilityindex.ai

Potential Based Reward Shaping for Hierarchical Reinforcement Learning

Authors: Yang Gao, Francesca Toni

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that PBRS-MAXQ-0 significantly outperforms MAXQ-0 given good heuristics, and can converge even when given misleading heuristics. ... We implement MAXQ-0 and PBRS-MAXQ-0 in two widely used applications for MAXQ: the Fickle Taxi problem and the Resource Collection problem, to compare their performances.
Researcher Affiliation	Academia	Yang Gao, Francesca Toni Department of Computing, Imperial College London {y.gao11,f.toni}@imperial.ac.uk
Pseudocode	Yes	Algorithm 1 The PBRS-MAXQ-0 algorithm.
Open Source Code	No	No statement explicitly providing open-source code for the methodology described in this paper was found. Footnote 3 links to supplementary material for proofs, not code.
Open Datasets	No	The paper describes the 'Fickle Taxi problem' and 'Resource Collection problem' as testbeds, which are environments, not specific publicly available datasets with access information (link, DOI, or formal citation). No concrete access information for a public dataset was provided.
Dataset Splits	No	The paper does not provide specific details on train/validation/test dataset splits. It describes learning parameters for episodes in reinforcement learning environments, which is a different concept from dataset splits for supervised learning.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers are provided. The paper does not list the versions of any programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	The initial values and the decreasing rates (in brackets) of α and ϵ are listed in Table 1. ... In all experiments and for all algorithms, we have γ = 1. ... The learning parameters used in each algorithm are listed in Table 2, and they are selected to maximise the convergence speed of each algorithm.