Exploiting Multiple Abstractions in Episodic RL via Reward Shaping
Authors: Roberto Cipollone, Giuseppe De Giacomo, Marco Favorito, Luca Iocchi, Fabio Patrizi
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Moreover, we prove that the method guarantees optimal convergence and we demonstrate its effectiveness experimentally. and (iv) an experimental analysis showing that our approach significantly improves sample-efficiency and that modelling errors yield only a limited performance degradation. |
| Researcher Affiliation | Collaboration | Roberto Cipollone1, Giuseppe De Giacomo1,2, Marco Favorito3, Luca Iocchi1, Fabio Patrizi1 1DIAG, Universit a degli Studi di Roma La Sapienza , Italy 2Department of Computer Science, University of Oxford, U.K. 3Banca d Italia, Italy |
| Pseudocode | Yes | Algorithm 1: Main algorithm |
| Open Source Code | Yes | Code available at https://github.com/cipollone/multinav2 |
| Open Datasets | No | The paper describes custom-built environments like the '4-rooms' and '8-rooms' domain. While the code for the environment might be implied to be in the linked GitHub repo, the paper does not provide concrete access information (link, DOI, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset as a standalone dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions the use of algorithms like Q-learning, Delayed Q-learning, and Dueling DQN, but does not provide specific ancillary software details, such as library or solver names with version numbers (e.g., Python version, PyTorch/TensorFlow versions). |
| Experiment Setup | No | The paper mentions 'Further training details can be found in the appendix' but does not include specific experimental setup details (concrete hyperparameter values, training configurations, or system-level settings) in the main text. |