reproducibilityindex.ai

Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

Authors: Shuang Qiu, Xiaohan Wei, Zhuoran Yang, Jieping Ye, Zhaoran Wang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In particular, we prove that the proposed algorithm achieves e O(L\|S\| p \|A\|T) upper bounds of both the regret and the constraint violation, where L is the length of each episode. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper conﬁdence reinforcement learning, which demonstrates the power of optimism in the face of uncertainty in constrained online learning. 5 Theoretical Analysis
Researcher Affiliation	Collaboration	1University of Michigan 2Facebook, Inc. 3Princeton University 4AI Lab, Didi Chuxing 5Northwestern University
Pseudocode	Yes	Algorithm 1 Upper-Conﬁdence Primal-Dual (UCPD) Mirror Descent
Open Source Code	No	The paper does not provide any specific links or explicit statements about the release of source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not use datasets for training or evaluation. No information about publicly available datasets was provided.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments requiring dataset splits. No specific dataset split information was found.
Hardware Specification	No	The paper is theoretical and does not describe any computational experiments that would require hardware specifications.
Software Dependencies	No	The paper is theoretical and does not describe implementation details that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with specific details like hyperparameters or training configurations.