Advice-Guided Reinforcement Learning in a non-Markovian Environment

Authors: Daniel Neider, Jean-Raphael Gaglione, Ivan Gavran, Ufuk Topcu, Bo Wu, Zhe Xu9073-9080

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments show that using well-chosen advice can reduce the number of training steps needed for convergence to optimal policy, and can decrease the computation time to learn the reward function by up to two orders of magnitude.
Researcher Affiliation Academia 1 Max Planck Institute for Software Systems, Kaiserslautern, Germany 2 Ecole Polytechnique, France 3 University of Texas at Austin, Texas, USA 4 Arizona State University, Arizona, USA
Pseudocode Yes Algorithm 1: The Adviso RL algorithm
Open Source Code No The paper mentions using external libraries like RC2 SAT solver and Py SAT library, but does not provide concrete access to the source code for the Adviso RL methodology described in the paper.
Open Datasets Yes This experiment is inspired by the Open AI Gym environment Taxi-v3 (https://gym.openai.com/envs/Taxi-v3/), introduced by Dietterich (1999).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets.
Hardware Specification Yes All experiments were conducted on a Vivobook laptop with 1.80-GHz Core i7 CPU and 32-GB RAM
Software Dependencies Yes Our implementation uses the RC2 SAT solver (Morgado, Dodaro, and Marques-Silva 2014) from the Py SAT library (Ignatiev, Morgado, and Marques-Silva 2018).
Experiment Setup No The paper describes the general experimental setup and environments but does not provide specific hyperparameter values, detailed training configurations, or system-level settings for reproducibility in the main text.