Online Learning in CMDPs: Handling Stochastic and Adversarial Constraints
Authors: Francesco Emanuele Stradi, Jacopo Germano, Gianmarco Genalti, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we provide the first best-of-bothworlds algorithm for CMDPs with long-term constraints, in the flavor of Balseiro et al. (2023). Our algorithm is capable of handling settings in which rewards and constraints are selected either stochastically or adversarially, without requiring any knowledge of the underling process. Moreover, our algorithm matches state-of-the-art regret and constraint violation bounds for settings in which constraints are selected stochastically, while it is the first to provide guarantees in the case in which they are chosen adversarially. [...] 5. Theoretical Results In this section we provide the theoretical results attained by Algorithm 2 in terms of cumulative regret and cumulative constraint violation. |
| Researcher Affiliation | Academia | 1Politecnico di Milano, Milan, Italy. Correspondence to: Francesco Emanuele Stradi <francescoemanuele.stradi@polimi.it>. |
| Pseudocode | Yes | Algorithm 1 Learner-Environment Interaction; Algorithm 2 PDGD-OPS; Algorithm 3 UC-O-GDPS.UPDATE |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating the availability of source code for the methodology described. |
| Open Datasets | No | The paper is theoretical, focusing on algorithm design and theoretical analysis. It does not conduct experiments on specific datasets, and therefore, it does not reference publicly available training data or provide access information for datasets. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm design and theoretical analysis. It does not involve experimental validation or discuss dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is purely theoretical, focusing on algorithm design and analysis. It does not describe any experiments that would require specific hardware, and thus, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical, presenting algorithm design and mathematical proofs. It does not describe any empirical experiments or implementations that would necessitate listing specific software dependencies with version numbers. |
| Experiment Setup | No | The paper focuses on theoretical contributions and algorithm design. It does not describe any empirical experimental setups, and therefore, no hyperparameters or system-level training settings are provided. |