Computably Continuous Reinforcement-Learning Objectives Are PAC-Learnable
Authors: Cambridge Yang, Michael Littman, Michael Carbin
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability in two analysis settings. In particular, for the analysis that considers only sample complexity, we prove that if an objective given as an oracle is uniformly continuous, then it is PAC-learnable. Further, for the analysis that considers computational complexity, we prove that if an objective is computable, then it is PAC-learnable. |
| Researcher Affiliation | Academia | Cambridge Yang1, Michael Littman2, Michael Carbin1 1 MIT 2 Brown University camyang@csail.mit.edu, mlittman@cs.brown.edu, mcarbin@csail.mit.edu |
| Pseudocode | Yes | Listing 1: Computation of the simple reward objective # Given reward machine (U, δu, δr, u0, γ) def SRM(w: (2Π)ω, n: N) Q: u: U, value: Q = u0, 0 rmax: Q = max(abs(δr(u1, u2)) for u1, u2 in U 2)) H: N = (log2floor(1 γ) n log2ceil(rmax))) / log2ceil(γ) for k in range(H): u = δu(u, w[k]) value += γ**k * δr(u, u ) u = u return value |
| Open Source Code | No | The paper does not mention providing open-source code for the theoretical framework or any implementations described within. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments with datasets, thus no information on dataset availability for training is provided. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with datasets, thus no information on training/validation/test splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup or specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations. |