Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning
Authors: Honghao Wei, Xiyue Peng, Arnob Ghosh, Xin Liu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Additionally, we offer a practical version of WSAC and compare it with existing state-of-the-art safe offline RL algorithms in several continuous control environments. WSAC outperforms all baselines across a range of tasks, supporting the theoretical results. |
| Researcher Affiliation | Academia | Honghao Wei Washington State University EMAIL Xiyue Peng Shanghai Tech University EMAIL Arnob Ghosh New Jersey Institute of Technology EMAIL Xin Liu Shanghai Tech University EMAIL |
| Pseudocode | Yes | Algorithm 1 Weighted Safe Actor-Critic (WSAC) |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the data and code. |
| Open Datasets | Yes | We use the offline dataset from Liu et al. (2019), where the corresponding expert policy are used to interact with the environments and collect the data. |
| Dataset Splits | No | The paper uses an offline dataset but does not specify explicit training, validation, or test dataset splits. |
| Hardware Specification | Yes | We run all the experiments with NVIDIA Ge Force RTX 3080 Ti 8 Core Processor. |
| Software Dependencies | No | The paper mentions using ADAM for optimization but does not provide specific version numbers for programming languages, libraries, or other software dependencies. |
| Experiment Setup | Yes | Table 3: Hyperparameters of WSAC |