Potential Based Reward Shaping for Hierarchical Reinforcement Learning
Authors: Yang Gao, Francesca Toni
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that PBRS-MAXQ-0 significantly outperforms MAXQ-0 given good heuristics, and can converge even when given misleading heuristics. ... We implement MAXQ-0 and PBRS-MAXQ-0 in two widely used applications for MAXQ: the Fickle Taxi problem and the Resource Collection problem, to compare their performances. |
| Researcher Affiliation | Academia | Yang Gao, Francesca Toni Department of Computing, Imperial College London {y.gao11,f.toni}@imperial.ac.uk |
| Pseudocode | Yes | Algorithm 1 The PBRS-MAXQ-0 algorithm. |
| Open Source Code | No | No statement explicitly providing open-source code for the methodology described in this paper was found. Footnote 3 links to supplementary material for proofs, not code. |
| Open Datasets | No | The paper describes the 'Fickle Taxi problem' and 'Resource Collection problem' as testbeds, which are environments, not specific publicly available datasets with access information (link, DOI, or formal citation). No concrete access information for a public dataset was provided. |
| Dataset Splits | No | The paper does not provide specific details on train/validation/test dataset splits. It describes learning parameters for episodes in reinforcement learning environments, which is a different concept from dataset splits for supervised learning. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments are mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers are provided. The paper does not list the versions of any programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | The initial values and the decreasing rates (in brackets) of α and ϵ are listed in Table 1. ... In all experiments and for all algorithms, we have γ = 1. ... The learning parameters used in each algorithm are listed in Table 2, and they are selected to maximise the convergence speed of each algorithm. |