reproducibilityindex.ai

Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model

Authors: Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on the standard safe offline RL benchmark DSRL (Liu et al., 2023a) show that FISOR is the only method that guarantees satisfactory safety performance in all evaluated tasks, while achieving the highest returns in most tasks.
Researcher Affiliation	Academia	Yinan Zheng1,2 , Jianxiong Li1,2 , Dongjie Yu3, Yujie Yang2, Shengbo Eben Li2, Xianyuan Zhan1,4 , Jingjing Liu1,2 1 Institute for AI Industry Research (AIR), Tsinghua University 2 School of Vehicle and Mobility, Tsinghua University 3 Department of Computer Science, The University of Hong Kong 4 Shanghai Artificial Intelligence Laboratory
Pseudocode	Yes	Algorithm 1 Feasibility-Guided Safe Offline RL (FISOR)
Open Source Code	Yes	Code is available at: https://github.com/Zheng Yinan-AIR/FISOR.
Open Datasets	Yes	We conduct extensive evaluations on Safety-Gymnasium (Ray et al., 2019; Ji et al., 2023), Bullet-Safety-Gym (Gronauer, 2022) and Meta Drive (Li et al., 2022) tasks on DSRL benchmark (Liu et al., 2023a).
Dataset Splits	No	The paper mentions evaluating over "20 evaluation episodes and 3 seeds" but does not explicitly provide details about training, validation, and test dataset splits with percentages, counts, or references to predefined splits.
Hardware Specification	Yes	On a single RTX 3090 GPU, we can perform 1 million gradient steps in approximately 45 minutes for all tasks.
Software Dependencies	No	We implement our approach using the JAX framework (Bradbury et al., 2018).
Experiment Setup	Yes	We use Adam optimizer with a learning rate 3e 4 for all networks. The batch size is set to 256 for value networks and 2048 for the diffusion model. We report the detailed setup in Table 5. Table 5 then provides specific values for various hyperparameters.