Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model

Authors: Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on the standard safe offline RL benchmark DSRL (Liu et al., 2023a) show that FISOR is the only method that guarantees satisfactory safety performance in all evaluated tasks, while achieving the highest returns in most tasks.
Researcher Affiliation Academia Yinan Zheng1,2 , Jianxiong Li1,2 , Dongjie Yu3, Yujie Yang2, Shengbo Eben Li2, Xianyuan Zhan1,4 , Jingjing Liu1,2 1 Institute for AI Industry Research (AIR), Tsinghua University 2 School of Vehicle and Mobility, Tsinghua University 3 Department of Computer Science, The University of Hong Kong 4 Shanghai Artificial Intelligence Laboratory
Pseudocode Yes Algorithm 1 Feasibility-Guided Safe Offline RL (FISOR)
Open Source Code Yes Code is available at: https://github.com/Zheng Yinan-AIR/FISOR.
Open Datasets Yes We conduct extensive evaluations on Safety-Gymnasium (Ray et al., 2019; Ji et al., 2023), Bullet-Safety-Gym (Gronauer, 2022) and Meta Drive (Li et al., 2022) tasks on DSRL benchmark (Liu et al., 2023a).
Dataset Splits No The paper mentions evaluating over "20 evaluation episodes and 3 seeds" but does not explicitly provide details about training, validation, and test dataset splits with percentages, counts, or references to predefined splits.
Hardware Specification Yes On a single RTX 3090 GPU, we can perform 1 million gradient steps in approximately 45 minutes for all tasks.
Software Dependencies No We implement our approach using the JAX framework (Bradbury et al., 2018).
Experiment Setup Yes We use Adam optimizer with a learning rate 3e 4 for all networks. The batch size is set to 256 for value networks and 2048 for the diffusion model. We report the detailed setup in Table 5. Table 5 then provides specific values for various hyperparameters.