Advice-Guided Reinforcement Learning in a non-Markovian Environment
Authors: Daniel Neider, Jean-Raphael Gaglione, Ivan Gavran, Ufuk Topcu, Bo Wu, Zhe Xu9073-9080
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments show that using well-chosen advice can reduce the number of training steps needed for convergence to optimal policy, and can decrease the computation time to learn the reward function by up to two orders of magnitude. |
| Researcher Affiliation | Academia | 1 Max Planck Institute for Software Systems, Kaiserslautern, Germany 2 Ecole Polytechnique, France 3 University of Texas at Austin, Texas, USA 4 Arizona State University, Arizona, USA |
| Pseudocode | Yes | Algorithm 1: The Adviso RL algorithm |
| Open Source Code | No | The paper mentions using external libraries like RC2 SAT solver and Py SAT library, but does not provide concrete access to the source code for the Adviso RL methodology described in the paper. |
| Open Datasets | Yes | This experiment is inspired by the Open AI Gym environment Taxi-v3 (https://gym.openai.com/envs/Taxi-v3/), introduced by Dietterich (1999). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets. |
| Hardware Specification | Yes | All experiments were conducted on a Vivobook laptop with 1.80-GHz Core i7 CPU and 32-GB RAM |
| Software Dependencies | Yes | Our implementation uses the RC2 SAT solver (Morgado, Dodaro, and Marques-Silva 2014) from the Py SAT library (Ignatiev, Morgado, and Marques-Silva 2018). |
| Experiment Setup | No | The paper describes the general experimental setup and environments but does not provide specific hyperparameter values, detailed training configurations, or system-level settings for reproducibility in the main text. |