OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning
Authors: Yihang Yao, Zhepeng Cen, Wenhao Ding, Haohong Lin, Shiqi Liu, Tingnan Zhang, Wenhao Yu, DING ZHAO
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluations on public benchmarks and varying datasets showcase OASIS s superiority in benefiting offline safe RL agents to achieve high-reward behavior while satisfying the safety constraints, outperforming established baselines. |
| Researcher Affiliation | Collaboration | Yihang Yao 1, Zhepeng Cen 1, Wenhao Ding1, Haohong Lin1, Shiqi Liu1, Tingnan Zhang2, Wenhao Yu2, Ding Zhao1 1 Carnegie Mellon University, 2 Google Deep Mind |
| Pseudocode | Yes | Algorithm 1: OASIS (dataset generation) |
| Open Source Code | Yes | Code is available on our Github repository, checkpoints and curated datasets are available on our Hugging Face repository. |
| Open Datasets | Yes | Our experiment tasks are mainly built upon the offline safe RL dataset OSRL [74]. |
| Dataset Splits | No | The paper discusses different "training dataset types" (full, tempting, conservative, hybrid) and mentions "data for offline RL agent training" but does not explicitly provide percentages or counts for train/validation/test splits for their experiments. |
| Hardware Specification | Yes | The experiments are run on a server with 2 AMD EPYC 7542 32-Core Processor CPU, 2 NVIDIA RTX A6000 graphics, and 252 GB memory. |
| Software Dependencies | No | The paper mentions using codebases from benchmarks and other authors but does not list specific version numbers for key software components or libraries used for implementation. |
| Experiment Setup | Yes | Table 4: Hyperparameters: L (length of subsequence) 32, K (denoising timestep) 20, Batch size 256, Learning rate 3.0e-5, wα 2.0 |