Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes
Authors: Alessandro Ronca, Gabriel Paludo Licks, Giuseppe De Giacomo
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees, and we also provide an experimental evaluation. |
| Researcher Affiliation | Academia | Alessandro Ronca , Gabriel Paludo Licks , Giuseppe De Giacomo DIAG, Sapienza University of Rome, Italy {ronca, licks, degiacomo}@diag.uniroma1.it |
| Pseudocode | Yes | Algorithm 1: Non Markov RL |
| Open Source Code | Yes | Source code, instructions, and deļ¬nitions of the experiments are available at: github.com/whitemech/markov-abstractions-code-ijcai22. |
| Open Datasets | Yes | We consider the domains from [Abadi and Brafman, 2020]: Rotating MAB, Malfunction MAB, Cheat MAB, and Rotating Maze; a variant of Enemy Corridor [Ronca and De Giacomo, 2021]; and two novel domains: Reset-Rotating MAB, and Flickering Grid. |
| Dataset Splits | No | The paper describes an evaluation process based on training episodes and averages over episodes but does not specify explicit training/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | Experiments were carried out on a server running Ubuntu 18.04.5 LTS, with 512GB RAM, and an 80 core Intel Xeon E5-2698 2.20GHz. Each training run takes one core. |
| Software Dependencies | No | The paper mentions 'Ubuntu 18.04.5 LTS' (operating system) but does not provide specific version numbers for any other software libraries, frameworks, or solvers used for the experiments. |
| Experiment Setup | No | The paper describes the evaluation frequency (every 15k training episodes) and how performance is measured (average reward over 50 episodes), but it does not provide specific hyperparameters (e.g., learning rate, batch size, optimizer settings) or detailed training configurations. |