Langevin Policy for Safe Reinforcement Learning
Authors: Fenghao Lei, Long Yang, Shiting Wen, Zhixiong Huang, Zhiwang Zhang, Chaoyi Pang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | extensive empirical results show the effectiveness and superiority of LAC on the Mu Jo Co-based and Safety Gym tasks. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2School of Artificial Intelligence, Peking University, Beijing, China 3School of Computer and Data Engneering, Ningbo Tech University, Ningbo, China. |
| Pseudocode | Yes | Algorithm 1 Langevin Policy; Algorithm 2 LAC (Langevin Actor-Critic) |
| Open Source Code | Yes | Our implementation is available at https://github.com/Lfh404/LAC. |
| Open Datasets | Yes | Velocity and Circle tasks are implemented using Open AI Gym API (Brockman et al., 2016) for Mu Jo Co physical simulator (Todorov et al., 2012). The other two tasks Button and Goal are implemented in Safety Gym benchmark suite (Ray et al., 2019). |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test dataset splits, such as percentages or sample counts for each split. |
| Hardware Specification | Yes | We run our experiments on Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz with 8 cores. |
| Software Dependencies | No | The paper mentions software like Open AI Gym, Mu Jo Co, and Omnisafe, but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | D.2. Algorithmic Hyperparameters: Table 4. Hyperparameters for Mu Jo Co tasks.; Table 5. Hyperparameters for Safety Gym tasks. |