Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Potential Based Reward Shaping for Hierarchical Reinforcement Learning
Authors: Yang Gao, Francesca Toni
IJCAI 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that PBRS-MAXQ-0 significantly outperforms MAXQ-0 given good heuristics, and can converge even when given misleading heuristics. ... We implement MAXQ-0 and PBRS-MAXQ-0 in two widely used applications for MAXQ: the Fickle Taxi problem and the Resource Collection problem, to compare their performances. |
| Researcher Affiliation | Academia | Yang Gao, Francesca Toni Department of Computing, Imperial College London EMAIL |
| Pseudocode | Yes | Algorithm 1 The PBRS-MAXQ-0 algorithm. |
| Open Source Code | No | No statement explicitly providing open-source code for the methodology described in this paper was found. Footnote 3 links to supplementary material for proofs, not code. |
| Open Datasets | No | The paper describes the 'Fickle Taxi problem' and 'Resource Collection problem' as testbeds, which are environments, not specific publicly available datasets with access information (link, DOI, or formal citation). No concrete access information for a public dataset was provided. |
| Dataset Splits | No | The paper does not provide specific details on train/validation/test dataset splits. It describes learning parameters for episodes in reinforcement learning environments, which is a different concept from dataset splits for supervised learning. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments are mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers are provided. The paper does not list the versions of any programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | The initial values and the decreasing rates (in brackets) of α and ϵ are listed in Table 1. ... In all experiments and for all algorithms, we have γ = 1. ... The learning parameters used in each algorithm are listed in Table 2, and they are selected to maximise the convergence speed of each algorithm. |