Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

Authors: Shuang Qiu, Xiaohan Wei, Zhuoran Yang, Jieping Ye, Zhaoran Wang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In particular, we prove that the proposed algorithm achieves e O(L|S| p |A|T) upper bounds of both the regret and the constraint violation, where L is the length of each episode. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning, which demonstrates the power of optimism in the face of uncertainty in constrained online learning. 5 Theoretical Analysis
Researcher Affiliation Collaboration 1University of Michigan 2Facebook, Inc. 3Princeton University 4AI Lab, Didi Chuxing 5Northwestern University
Pseudocode Yes Algorithm 1 Upper-Confidence Primal-Dual (UCPD) Mirror Descent
Open Source Code No The paper does not provide any specific links or explicit statements about the release of source code for the methodology described.
Open Datasets No The paper is theoretical and does not use datasets for training or evaluation. No information about publicly available datasets was provided.
Dataset Splits No The paper is theoretical and does not involve empirical experiments requiring dataset splits. No specific dataset split information was found.
Hardware Specification No The paper is theoretical and does not describe any computational experiments that would require hardware specifications.
Software Dependencies No The paper is theoretical and does not describe implementation details that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with specific details like hyperparameters or training configurations.