reproducibilityindex.ai

Computably Continuous Reinforcement-Learning Objectives Are PAC-Learnable

Authors: Cambridge Yang, Michael Littman, Michael Carbin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability in two analysis settings. In particular, for the analysis that considers only sample complexity, we prove that if an objective given as an oracle is uniformly continuous, then it is PAC-learnable. Further, for the analysis that considers computational complexity, we prove that if an objective is computable, then it is PAC-learnable.
Researcher Affiliation	Academia	Cambridge Yang1, Michael Littman2, Michael Carbin1 1 MIT 2 Brown University camyang@csail.mit.edu, mlittman@cs.brown.edu, mcarbin@csail.mit.edu
Pseudocode	Yes	Listing 1: Computation of the simple reward objective # Given reward machine (U, δu, δr, u0, γ) def SRM(w: (2Π)ω, n: N) Q: u: U, value: Q = u0, 0 rmax: Q = max(abs(δr(u1, u2)) for u1, u2 in U 2)) H: N = (log2floor(1 γ) n log2ceil(rmax))) / log2ceil(γ) for k in range(H): u = δu(u, w[k]) value += γ*k δr(u, u ) u = u return value
Open Source Code	No	The paper does not mention providing open-source code for the theoretical framework or any implementations described within.
Open Datasets	No	The paper is theoretical and does not conduct experiments with datasets, thus no information on dataset availability for training is provided.
Dataset Splits	No	The paper is theoretical and does not conduct experiments with datasets, thus no information on training/validation/test splits is provided.
Hardware Specification	No	The paper is theoretical and does not describe any experiments, thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any experimental setup or specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations.