reproducibilityindex.ai

OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning

Authors: Yihang Yao, Zhepeng Cen, Wenhao Ding, Haohong Lin, Shiqi Liu, Tingnan Zhang, Wenhao Yu, DING ZHAO

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive evaluations on public benchmarks and varying datasets showcase OASIS s superiority in benefiting offline safe RL agents to achieve high-reward behavior while satisfying the safety constraints, outperforming established baselines.
Researcher Affiliation	Collaboration	Yihang Yao 1, Zhepeng Cen 1, Wenhao Ding1, Haohong Lin1, Shiqi Liu1, Tingnan Zhang2, Wenhao Yu2, Ding Zhao1 1 Carnegie Mellon University, 2 Google Deep Mind
Pseudocode	Yes	Algorithm 1: OASIS (dataset generation)
Open Source Code	Yes	Code is available on our Github repository, checkpoints and curated datasets are available on our Hugging Face repository.
Open Datasets	Yes	Our experiment tasks are mainly built upon the offline safe RL dataset OSRL [74].
Dataset Splits	No	The paper discusses different "training dataset types" (full, tempting, conservative, hybrid) and mentions "data for offline RL agent training" but does not explicitly provide percentages or counts for train/validation/test splits for their experiments.
Hardware Specification	Yes	The experiments are run on a server with 2 AMD EPYC 7542 32-Core Processor CPU, 2 NVIDIA RTX A6000 graphics, and 252 GB memory.
Software Dependencies	No	The paper mentions using codebases from benchmarks and other authors but does not list specific version numbers for key software components or libraries used for implementation.
Experiment Setup	Yes	Table 4: Hyperparameters: L (length of subsequence) 32, K (denoising timestep) 20, Batch size 256, Learning rate 3.0e-5, wα 2.0