reproducibilityindex.ai

Advice-Guided Reinforcement Learning in a non-Markovian Environment

Authors: Daniel Neider, Jean-Raphael Gaglione, Ivan Gavran, Ufuk Topcu, Bo Wu, Zhe Xu9073-9080

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments show that using well-chosen advice can reduce the number of training steps needed for convergence to optimal policy, and can decrease the computation time to learn the reward function by up to two orders of magnitude.
Researcher Affiliation	Academia	1 Max Planck Institute for Software Systems, Kaiserslautern, Germany 2 Ecole Polytechnique, France 3 University of Texas at Austin, Texas, USA 4 Arizona State University, Arizona, USA
Pseudocode	Yes	Algorithm 1: The Adviso RL algorithm
Open Source Code	No	The paper mentions using external libraries like RC2 SAT solver and Py SAT library, but does not provide concrete access to the source code for the Adviso RL methodology described in the paper.
Open Datasets	Yes	This experiment is inspired by the Open AI Gym environment Taxi-v3 (https://gym.openai.com/envs/Taxi-v3/), introduced by Dietterich (1999).
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets.
Hardware Specification	Yes	All experiments were conducted on a Vivobook laptop with 1.80-GHz Core i7 CPU and 32-GB RAM
Software Dependencies	Yes	Our implementation uses the RC2 SAT solver (Morgado, Dodaro, and Marques-Silva 2014) from the Py SAT library (Ignatiev, Morgado, and Marques-Silva 2018).
Experiment Setup	No	The paper describes the general experimental setup and environments but does not provide specific hyperparameter values, detailed training configurations, or system-level settings for reproducibility in the main text.