Concurrent PAC RL

Authors: Zhaohan Guo, Emma Brunskill

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our preliminary experiments confirm this result and show empirical benefits. We also provide small simulation experiments that support our theoretical results and demonstrate the advantage of carefully sharing information during concurrent reinforcement learning.
Researcher Affiliation Academia Zhaohan Guo and Emma Brunskill Carnegie Mellon University 5000 Forbes Ave. Pittsburgh PA, 15213 United States
Pseudocode Yes Algorithm 1 PAC-EXPLORE
Open Source Code No No explicit statement or link for open-source code release for the described methodology was found.
Open Datasets No We use a 3x3 gridworld (Figure 1(a)).
Dataset Splits No Each run was for 10000 time steps and all experiments were averaged over 100 runs.
Hardware Specification No No specific hardware details (like GPU/CPU models or memory) used for running experiments were mentioned.
Software Dependencies No The paper mentions algorithms like MBIE and PAC-EXPLORE but does not list any specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup Yes We tuned the confidence interval parameters to maximize the cumulative reward for acting in a single task, and then used the same settings for all concurrent RL scenarios. We set m = , which essentially corresponds to always continuing to improve and refine the parameter estimates (fixing them after a certain number of experiences is important for the theoretical results but empirically it is often best to use all available experience). The PAC-EXPLORE algorithm was optimized with me = 1 and T = 4, and fixed for all runs.