Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability
Authors: Aviv Tamar, Daniel Soudry, Ev Zisselman8423-8431
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense. Most stability results in the literature build on strong convexity of the regularized loss an approach that is not suitable for RL as Markov decision processes (MDPs) are not convex. Instead, building on recent results of fast convergence rates for mirror descent in regularized MDPs, we show that regularized MDPs satisfy a certain quadratic growth criterion, which is sufficient to establish stability. This result, which may be of independent interest, allows us to study the effect of regularization on generalization in the Bayesian RL setting. |
| Researcher Affiliation | Academia | Aviv Tamar, Daniel Soudry, Ev Zisselman Technion Israel Institute of Technology |
| Pseudocode | No | The main paper text does not contain any pseudocode or algorithm blocks. It references a section in the supplementary material for details on a uniform trust region policy optimization algorithm, but the pseudocode itself is not presented within the provided paper content. |
| Open Source Code | No | The paper does not contain any explicit statement about making its own source code publicly available or a link to a code repository for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not use or provide access to any specific, publicly available datasets for training empirical models. It refers to a theoretical "training set of N simulators for N independently sampled MDPs" in its problem formulation. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments that would require specific train/validation/test dataset splits. It discusses theoretical concepts of training and testing but not in the context of data partitioning for reproducibility. |
| Hardware Specification | No | The paper is theoretical and does not conduct empirical experiments, therefore it does not specify any hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not provide specific ancillary software details with version numbers required to replicate empirical experiments. |
| Experiment Setup | No | The paper is theoretical and does not describe an empirical experimental setup, and therefore does not provide specific details such as hyperparameter values or system-level training settings. |