reproducibilityindex.ai

Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes

Authors: Alessandro Ronca, Gabriel Paludo Licks, Giuseppe De Giacomo

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees, and we also provide an experimental evaluation.
Researcher Affiliation	Academia	Alessandro Ronca , Gabriel Paludo Licks , Giuseppe De Giacomo DIAG, Sapienza University of Rome, Italy {ronca, licks, degiacomo}@diag.uniroma1.it
Pseudocode	Yes	Algorithm 1: Non Markov RL
Open Source Code	Yes	Source code, instructions, and deﬁnitions of the experiments are available at: github.com/whitemech/markov-abstractions-code-ijcai22.
Open Datasets	Yes	We consider the domains from [Abadi and Brafman, 2020]: Rotating MAB, Malfunction MAB, Cheat MAB, and Rotating Maze; a variant of Enemy Corridor [Ronca and De Giacomo, 2021]; and two novel domains: Reset-Rotating MAB, and Flickering Grid.
Dataset Splits	No	The paper describes an evaluation process based on training episodes and averages over episodes but does not specify explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification	Yes	Experiments were carried out on a server running Ubuntu 18.04.5 LTS, with 512GB RAM, and an 80 core Intel Xeon E5-2698 2.20GHz. Each training run takes one core.
Software Dependencies	No	The paper mentions 'Ubuntu 18.04.5 LTS' (operating system) but does not provide specific version numbers for any other software libraries, frameworks, or solvers used for the experiments.
Experiment Setup	No	The paper describes the evaluation frequency (every 15k training episodes) and how performance is measured (average reward over 50 episodes), but it does not provide specific hyperparameters (e.g., learning rate, batch size, optimizer settings) or detailed training configurations.