OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning

Authors: Yihang Yao, Zhepeng Cen, Wenhao Ding, Haohong Lin, Shiqi Liu, Tingnan Zhang, Wenhao Yu, DING ZHAO

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations on public benchmarks and varying datasets showcase OASIS s superiority in benefiting offline safe RL agents to achieve high-reward behavior while satisfying the safety constraints, outperforming established baselines.
Researcher Affiliation Collaboration Yihang Yao 1, Zhepeng Cen 1, Wenhao Ding1, Haohong Lin1, Shiqi Liu1, Tingnan Zhang2, Wenhao Yu2, Ding Zhao1 1 Carnegie Mellon University, 2 Google Deep Mind
Pseudocode Yes Algorithm 1: OASIS (dataset generation)
Open Source Code Yes Code is available on our Github repository, checkpoints and curated datasets are available on our Hugging Face repository.
Open Datasets Yes Our experiment tasks are mainly built upon the offline safe RL dataset OSRL [74].
Dataset Splits No The paper discusses different "training dataset types" (full, tempting, conservative, hybrid) and mentions "data for offline RL agent training" but does not explicitly provide percentages or counts for train/validation/test splits for their experiments.
Hardware Specification Yes The experiments are run on a server with 2 AMD EPYC 7542 32-Core Processor CPU, 2 NVIDIA RTX A6000 graphics, and 252 GB memory.
Software Dependencies No The paper mentions using codebases from benchmarks and other authors but does not list specific version numbers for key software components or libraries used for implementation.
Experiment Setup Yes Table 4: Hyperparameters: L (length of subsequence) 32, K (denoising timestep) 20, Batch size 256, Learning rate 3.0e-5, wα 2.0