reproducibilityindex.ai

Truly No-Regret Learning in Constrained MDPs

Authors: Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Additionally, we provide numerical evaluations of our algorithm in simple environments. We perform numerical simulations of our algorithms and compare them to their unregularized counterparts (Efroni et al., 2020).
Researcher Affiliation	Academia	1EPFL 2ETH Zurich 3University of Zurich
Pseudocode	Yes	Algorithm 1 Regularized Primal-Dual Algorithm with Optimistic Exploration
Open Source Code	Yes	We provide the code in the supplementary material.
Open Datasets	No	We consider a randomly generated CMDP with deterministic rewards and unknown transitions.
Dataset Splits	No	No specific dataset splits (training, validation, test) were mentioned as the environment is randomly generated for simulation and interaction.
Hardware Specification	Yes	All simulations were performed on a Mac Book Pro 2.8 GHz Quad-Core Intel Core i7.
Software Dependencies	No	No specific software names with version numbers were mentioned.
Experiment Setup	Yes	For the vanilla algorithms, we run for K = 4000 episodes for each step size η {0.05, 0.075, 0.1, 0.125, 0.15, 0.2}, which we observed to be a reasonable range across CMDPs when fixing the number of episodes. Similarly, for the regularized algorithms, we perform the same parameter search across all pairs of step size η {0.05, 0.1, 0.2} and regularization parameter τ {0.01, 0.02}, totaling a number of six hyperparameter configurations as well. We always set λmax = 6, which did not play a role in our simulations as long as it was chosen sufficiently large. We use exploration bonuses 0.08 nh(s, a) 1/2.