reproducibilityindex.ai

Online Learning in CMDPs: Handling Stochastic and Adversarial Constraints

Authors: Francesco Emanuele Stradi, Jacopo Germano, Gianmarco Genalti, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we provide the first best-of-bothworlds algorithm for CMDPs with long-term constraints, in the flavor of Balseiro et al. (2023). Our algorithm is capable of handling settings in which rewards and constraints are selected either stochastically or adversarially, without requiring any knowledge of the underling process. Moreover, our algorithm matches state-of-the-art regret and constraint violation bounds for settings in which constraints are selected stochastically, while it is the first to provide guarantees in the case in which they are chosen adversarially. [...] 5. Theoretical Results In this section we provide the theoretical results attained by Algorithm 2 in terms of cumulative regret and cumulative constraint violation.
Researcher Affiliation	Academia	1Politecnico di Milano, Milan, Italy. Correspondence to: Francesco Emanuele Stradi <francescoemanuele.stradi@polimi.it>.
Pseudocode	Yes	Algorithm 1 Learner-Environment Interaction; Algorithm 2 PDGD-OPS; Algorithm 3 UC-O-GDPS.UPDATE
Open Source Code	No	The paper does not contain any explicit statement or link indicating the availability of source code for the methodology described.
Open Datasets	No	The paper is theoretical, focusing on algorithm design and theoretical analysis. It does not conduct experiments on specific datasets, and therefore, it does not reference publicly available training data or provide access information for datasets.
Dataset Splits	No	The paper is theoretical and focuses on algorithm design and theoretical analysis. It does not involve experimental validation or discuss dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is purely theoretical, focusing on algorithm design and analysis. It does not describe any experiments that would require specific hardware, and thus, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical, presenting algorithm design and mathematical proofs. It does not describe any empirical experiments or implementations that would necessitate listing specific software dependencies with version numbers.
Experiment Setup	No	The paper focuses on theoretical contributions and algorithm design. It does not describe any empirical experimental setups, and therefore, no hyperparameters or system-level training settings are provided.