Efficient PAC Reinforcement Learning in Regular Decision Processes
Authors: Alessandro Ronca, Giuseppe De Giacomo
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We present an algorithm that computes a near-optimal policy with high confidence, in a number of steps that is polynomial in the required accuracy and confidence, and in a set of parameters that describe the underlying RDP. |
| Researcher Affiliation | Academia | Alessandro Ronca and Giuseppe De Giacomo DIAG, Sapienza University of Rome, Italy {ronca,degiacomo}@diag.uniroma1.it |
| Pseudocode | Yes | Algorithm 1 Reinforcement Learning RL(A, γ, ϵ, δ) and Algorithm 2 Reinforcement Learning RL(A, γ, ϵ, δ, ˆn) |
| Open Source Code | No | The paper does not include any statement or link about the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments with a specific dataset, therefore no access information for a training dataset is provided. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments that would require specific training/validation/test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper mentions algorithms (e.g., Ada CT algorithm, Value Iteration) but does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and PAC analysis; it does not describe specific experimental setup details like hyperparameters or training configurations. |