Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model
Authors: Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on the standard safe offline RL benchmark DSRL (Liu et al., 2023a) show that FISOR is the only method that guarantees satisfactory safety performance in all evaluated tasks, while achieving the highest returns in most tasks. |
| Researcher Affiliation | Academia | Yinan Zheng1,2 , Jianxiong Li1,2 , Dongjie Yu3, Yujie Yang2, Shengbo Eben Li2, Xianyuan Zhan1,4 , Jingjing Liu1,2 1 Institute for AI Industry Research (AIR), Tsinghua University 2 School of Vehicle and Mobility, Tsinghua University 3 Department of Computer Science, The University of Hong Kong 4 Shanghai Artificial Intelligence Laboratory |
| Pseudocode | Yes | Algorithm 1 Feasibility-Guided Safe Offline RL (FISOR) |
| Open Source Code | Yes | Code is available at: https://github.com/Zheng Yinan-AIR/FISOR. |
| Open Datasets | Yes | We conduct extensive evaluations on Safety-Gymnasium (Ray et al., 2019; Ji et al., 2023), Bullet-Safety-Gym (Gronauer, 2022) and Meta Drive (Li et al., 2022) tasks on DSRL benchmark (Liu et al., 2023a). |
| Dataset Splits | No | The paper mentions evaluating over "20 evaluation episodes and 3 seeds" but does not explicitly provide details about training, validation, and test dataset splits with percentages, counts, or references to predefined splits. |
| Hardware Specification | Yes | On a single RTX 3090 GPU, we can perform 1 million gradient steps in approximately 45 minutes for all tasks. |
| Software Dependencies | No | We implement our approach using the JAX framework (Bradbury et al., 2018). |
| Experiment Setup | Yes | We use Adam optimizer with a learning rate 3e 4 for all networks. The batch size is set to 256 for value networks and 2048 for the diffusion model. We report the detailed setup in Table 5. Table 5 then provides specific values for various hyperparameters. |