Concurrent PAC RL
Authors: Zhaohan Guo, Emma Brunskill
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our preliminary experiments confirm this result and show empirical benefits. We also provide small simulation experiments that support our theoretical results and demonstrate the advantage of carefully sharing information during concurrent reinforcement learning. |
| Researcher Affiliation | Academia | Zhaohan Guo and Emma Brunskill Carnegie Mellon University 5000 Forbes Ave. Pittsburgh PA, 15213 United States |
| Pseudocode | Yes | Algorithm 1 PAC-EXPLORE |
| Open Source Code | No | No explicit statement or link for open-source code release for the described methodology was found. |
| Open Datasets | No | We use a 3x3 gridworld (Figure 1(a)). |
| Dataset Splits | No | Each run was for 10000 time steps and all experiments were averaged over 100 runs. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) used for running experiments were mentioned. |
| Software Dependencies | No | The paper mentions algorithms like MBIE and PAC-EXPLORE but does not list any specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers). |
| Experiment Setup | Yes | We tuned the confidence interval parameters to maximize the cumulative reward for acting in a single task, and then used the same settings for all concurrent RL scenarios. We set m = , which essentially corresponds to always continuing to improve and refine the parameter estimates (fixing them after a certain number of experiences is important for the theoretical results but empirically it is often best to use all available experience). The PAC-EXPLORE algorithm was optimized with me = 1 and T = 4, and fixed for all runs. |