reproducibilityindex.ai

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Authors: Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Steven Wu, Bo Li, Ding Zhao

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A wide range of experiments on continuous robotic tasks shows that the proposed method achieves significantly better constraint satisfaction performance and better sample efficiency than baselines.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Nuro Inc. 3University of Illinois Urbana-Champaign. Correspondence to: Zuxin Liu <zuxinl@cmu.edu>, Ding Zhao <dingzhao@cmu.edu>.
Pseudocode	Yes	Algorithm 1 CVPO Training for One Epoch
Open Source Code	Yes	The code is available at https://github.com/ liuzuxin/cvpo-safe-rl.
Open Datasets	Yes	The task environment implementations are built upon Safety Gym (based on Mujoco) (Ray et al., 2019) and its Py Bullet implementation (Gronauer, 2022).
Dataset Splits	No	The paper describes experiments in reinforcement learning environments, which typically involve continuous interaction rather than static train/validation/test dataset splits. No explicit percentage or sample counts for validation splits are provided.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper mentions software like
Experiment Setup	Yes	The hyperparameters are shown in Table 1. More details can be found in the code. Common Hyperparameters CVPO Hyperparameter Policy network sizes [256, 256] Q network sizes [256, 256] Network activation Re LU Discount factor gamma γ 0.99 Polyak weight ρ: 0.995 Batch size B: 300 Rollout trajectory number T 20 Critics learning rate αc 0.001 NN Optimizer Adam Particle size K 32 M-step iterations M 6 Learning rate αµ 1 Learning rate αΣ 100 Learning rate αθ 0.002 E-step KL threshold ϵ2: 0.1 M-step KL threshold ϵµ: 0.001 M-step KL threshold ϵΣ: 0.0001 E-step solver SLSQP