Near-Optimal Distributionally Robust Reinforcement Learning with General $L_p$ Norms

Authors: Pierre Clavier, Laixi Shi, Erwan Le Pennec, Eric Mazumdar, Adam Wierman, Matthieu Geist

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This work refined sample complexity bounds to learn robust Markov decision processes when the uncertainty set is characterized by an general Lp metric, assuming the presence of a generative model. Our findings not only strengthen the current knowledge by improving both the upper and lower bounds... This work is the first to provide results with a minimax bound...
Researcher Affiliation Collaboration Pierre Clavier Ecole Polytechnique, Inria Laixi Shi Caltech Erwan Le Pennec Ecole polytechnique Eric Mazumdar Caltech Adam Wierman Caltech Matthieu Geist Cohere
Pseudocode Yes Algorithm 1: Distributionally robust value iteration (DRV I) for infinite-horizon RMDPs for sa-rectangular for arbitrary norm
Open Source Code No The NeurIPS checklist explicitly states "The answer NA means that the paper does not include experiments requiring code." for "open access to data and code" and "The answer NA means that the paper does not include experiments." for "Experimental Result Reproducibility". This indicates no code is provided. There is no explicit statement or link in the paper offering the source code for the described methodology.
Open Datasets No Following Zhou et al. [2021], Panaganti and Kalathil [2022], we assume access to a generative model or a simulator [Kearns and Singh, 1999], which allows us to collect N independent samples for each state-action pair generated based on the nominal kernel P 0: (s, a) S A, si,s,a i.i.d P 0( | s, a), i = 1, 2, , N. The total sample size is, therefore, NSA.
Dataset Splits No The paper is theoretical and focuses on sample complexity bounds and algorithms, not on empirical evaluation involving dataset splits. There is no mention of training, validation, or test splits. The NeurIPS checklist states "The answer NA means that the paper does not include experiments."
Hardware Specification No The paper is theoretical and does not describe any experimental setup or hardware used for computation. The NeurIPS checklist indicates "The answer NA means that the paper does not include experiments."
Software Dependencies No The paper is theoretical and does not mention any software dependencies with specific version numbers. The NeurIPS checklist indicates "The answer NA means that the paper does not include experiments."
Experiment Setup No The paper is theoretical and focuses on mathematical derivations and algorithm design (pseudocode), not on specific experimental setups with hyperparameters or training configurations. The NeurIPS checklist indicates "The answer NA means that the paper does not include experiments."