Langevin Policy for Safe Reinforcement Learning

Authors: Fenghao Lei, Long Yang, Shiting Wen, Zhixiong Huang, Zhiwang Zhang, Chaoyi Pang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental extensive empirical results show the effectiveness and superiority of LAC on the Mu Jo Co-based and Safety Gym tasks.
Researcher Affiliation Academia 1College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2School of Artificial Intelligence, Peking University, Beijing, China 3School of Computer and Data Engneering, Ningbo Tech University, Ningbo, China.
Pseudocode Yes Algorithm 1 Langevin Policy; Algorithm 2 LAC (Langevin Actor-Critic)
Open Source Code Yes Our implementation is available at https://github.com/Lfh404/LAC.
Open Datasets Yes Velocity and Circle tasks are implemented using Open AI Gym API (Brockman et al., 2016) for Mu Jo Co physical simulator (Todorov et al., 2012). The other two tasks Button and Goal are implemented in Safety Gym benchmark suite (Ray et al., 2019).
Dataset Splits No The paper does not explicitly provide details about training, validation, and test dataset splits, such as percentages or sample counts for each split.
Hardware Specification Yes We run our experiments on Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz with 8 cores.
Software Dependencies No The paper mentions software like Open AI Gym, Mu Jo Co, and Omnisafe, but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes D.2. Algorithmic Hyperparameters: Table 4. Hyperparameters for Mu Jo Co tasks.; Table 5. Hyperparameters for Safety Gym tasks.