Computably Continuous Reinforcement-Learning Objectives Are PAC-Learnable

Authors: Cambridge Yang, Michael Littman, Michael Carbin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability in two analysis settings. In particular, for the analysis that considers only sample complexity, we prove that if an objective given as an oracle is uniformly continuous, then it is PAC-learnable. Further, for the analysis that considers computational complexity, we prove that if an objective is computable, then it is PAC-learnable.
Researcher Affiliation Academia Cambridge Yang1, Michael Littman2, Michael Carbin1 1 MIT 2 Brown University camyang@csail.mit.edu, mlittman@cs.brown.edu, mcarbin@csail.mit.edu
Pseudocode Yes Listing 1: Computation of the simple reward objective # Given reward machine (U, δu, δr, u0, γ) def SRM(w: (2Π)ω, n: N) Q: u: U, value: Q = u0, 0 rmax: Q = max(abs(δr(u1, u2)) for u1, u2 in U 2)) H: N = (log2floor(1 γ) n log2ceil(rmax))) / log2ceil(γ) for k in range(H): u = δu(u, w[k]) value += γ**k * δr(u, u ) u = u return value
Open Source Code No The paper does not mention providing open-source code for the theoretical framework or any implementations described within.
Open Datasets No The paper is theoretical and does not conduct experiments with datasets, thus no information on dataset availability for training is provided.
Dataset Splits No The paper is theoretical and does not conduct experiments with datasets, thus no information on training/validation/test splits is provided.
Hardware Specification No The paper is theoretical and does not describe any experiments, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe any experimental setup or specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations.