Near-Optimal Distributionally Robust Reinforcement Learning with General $L_p$ Norms
Authors: Pierre Clavier, Laixi Shi, Erwan Le Pennec, Eric Mazumdar, Adam Wierman, Matthieu Geist
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This work refined sample complexity bounds to learn robust Markov decision processes when the uncertainty set is characterized by an general Lp metric, assuming the presence of a generative model. Our findings not only strengthen the current knowledge by improving both the upper and lower bounds... This work is the first to provide results with a minimax bound... |
| Researcher Affiliation | Collaboration | Pierre Clavier Ecole Polytechnique, Inria Laixi Shi Caltech Erwan Le Pennec Ecole polytechnique Eric Mazumdar Caltech Adam Wierman Caltech Matthieu Geist Cohere |
| Pseudocode | Yes | Algorithm 1: Distributionally robust value iteration (DRV I) for infinite-horizon RMDPs for sa-rectangular for arbitrary norm |
| Open Source Code | No | The NeurIPS checklist explicitly states "The answer NA means that the paper does not include experiments requiring code." for "open access to data and code" and "The answer NA means that the paper does not include experiments." for "Experimental Result Reproducibility". This indicates no code is provided. There is no explicit statement or link in the paper offering the source code for the described methodology. |
| Open Datasets | No | Following Zhou et al. [2021], Panaganti and Kalathil [2022], we assume access to a generative model or a simulator [Kearns and Singh, 1999], which allows us to collect N independent samples for each state-action pair generated based on the nominal kernel P 0: (s, a) S A, si,s,a i.i.d P 0( | s, a), i = 1, 2, , N. The total sample size is, therefore, NSA. |
| Dataset Splits | No | The paper is theoretical and focuses on sample complexity bounds and algorithms, not on empirical evaluation involving dataset splits. There is no mention of training, validation, or test splits. The NeurIPS checklist states "The answer NA means that the paper does not include experiments." |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or hardware used for computation. The NeurIPS checklist indicates "The answer NA means that the paper does not include experiments." |
| Software Dependencies | No | The paper is theoretical and does not mention any software dependencies with specific version numbers. The NeurIPS checklist indicates "The answer NA means that the paper does not include experiments." |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical derivations and algorithm design (pseudocode), not on specific experimental setups with hyperparameters or training configurations. The NeurIPS checklist indicates "The answer NA means that the paper does not include experiments." |