reproducibilityindex.ai

Langevin Policy for Safe Reinforcement Learning

Authors: Fenghao Lei, Long Yang, Shiting Wen, Zhixiong Huang, Zhiwang Zhang, Chaoyi Pang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	extensive empirical results show the effectiveness and superiority of LAC on the Mu Jo Co-based and Safety Gym tasks.
Researcher Affiliation	Academia	1College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2School of Artificial Intelligence, Peking University, Beijing, China 3School of Computer and Data Engneering, Ningbo Tech University, Ningbo, China.
Pseudocode	Yes	Algorithm 1 Langevin Policy; Algorithm 2 LAC (Langevin Actor-Critic)
Open Source Code	Yes	Our implementation is available at https://github.com/Lfh404/LAC.
Open Datasets	Yes	Velocity and Circle tasks are implemented using Open AI Gym API (Brockman et al., 2016) for Mu Jo Co physical simulator (Todorov et al., 2012). The other two tasks Button and Goal are implemented in Safety Gym benchmark suite (Ray et al., 2019).
Dataset Splits	No	The paper does not explicitly provide details about training, validation, and test dataset splits, such as percentages or sample counts for each split.
Hardware Specification	Yes	We run our experiments on Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz with 8 cores.
Software Dependencies	No	The paper mentions software like Open AI Gym, Mu Jo Co, and Omnisafe, but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	D.2. Algorithmic Hyperparameters: Table 4. Hyperparameters for Mu Jo Co tasks.; Table 5. Hyperparameters for Safety Gym tasks.