Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
Authors: Shuang Qiu, Xiaohan Wei, Zhuoran Yang, Jieping Ye, Zhaoran Wang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In particular, we prove that the proposed algorithm achieves e O(L|S| p |A|T) upper bounds of both the regret and the constraint violation, where L is the length of each episode. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning, which demonstrates the power of optimism in the face of uncertainty in constrained online learning. 5 Theoretical Analysis |
| Researcher Affiliation | Collaboration | 1University of Michigan 2Facebook, Inc. 3Princeton University 4AI Lab, Didi Chuxing 5Northwestern University |
| Pseudocode | Yes | Algorithm 1 Upper-Confidence Primal-Dual (UCPD) Mirror Descent |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the release of source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not use datasets for training or evaluation. No information about publicly available datasets was provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments requiring dataset splits. No specific dataset split information was found. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not describe implementation details that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with specific details like hyperparameters or training configurations. |