A Simple Reward-free Approach to Constrained Reinforcement Learning
Authors: Sobhan Miryoosefi, Chi Jin
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity. Utilizing the existing reward-free RL solvers, our framework provides sharp sample complexity results for constrained RL in the tabular MDP setting, matching the best existing results up to a factor of horizon dependence; our framework directly extends to a setting of tabular two-player Markov games, and gives a new result for constrained RL with linear function approximation. |
| Researcher Affiliation | Academia | 1Princeton University. Correspondence to: Sobhan Miryoosefi <miryoosefi@cs.princeton.edu>. |
| Pseudocode | Yes | Algorithm 1 Meta-algorithm for VMDPs... Algorithm 2 Meta-algorithm for VMGs... Algorithm 3 Solving Constrained RL Using Approachability... Algorithm 4 Online gradient ascent (OGA)... Algorithm 5 VI-Zero: Exploration Phase... Algorithm 6 Reward-Free RL for Linear VMDPs: Exploration Phase... Algorithm 7 Reward-Free RL for Linear VMDPs: Planning Phase... Algorithm 8 VI-Zero for VMGs: Exploration Phase |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper focuses on theoretical analysis of algorithms in different settings (e.g., 'tabular MDP setting', 'linear function approximation setting', 'Vector-valued Markov games') but does not mention specific datasets used for training or provide access information for them. |
| Dataset Splits | No | The paper is theoretical and analyzes sample complexity. It does not mention any dataset splits (training, validation, test) for experimental reproduction. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used to run experiments. |
| Software Dependencies | No | The paper is theoretical and does not list specific software dependencies with version numbers for implementation or experimental setup. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithmic design and analysis. It mentions 'Hyperparameters: learning rate ηt' within algorithm definitions, but these are abstract parameters of the theoretical algorithms, not concrete settings for an empirical experiment. It does not provide specific hyperparameter values or training configurations for any experimental setup. |