Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
Authors: Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed method on D4RL benchmarks [9] and verify that the proposed method outperforms the previous state-of-the-art by a large margin on various types of environments and datasets. |
| Researcher Affiliation | Collaboration | Seoul National University1 Neural Processing Research Center2 Deep Metrics3 |
| Pseudocode | Yes | Algorithm 1 Ensemble-Diversified Actor Critic (EDAC) |
| Open Source Code | Yes | The code is available online3. 3https://github.com/snu-mllab/EDAC |
| Open Datasets | Yes | We evaluate our proposed methods against the previous offline RL algorithms on the standard D4RL benchmark [9]. |
| Dataset Splits | No | The paper uses standard D4RL benchmarks but does not explicitly provide the specific train/validation/test dataset splits (e.g., percentages or sample counts) used for their experiments. |
| Hardware Specification | Yes | We run our experiments on a single machine with one RTX 3090 GPU |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries used in the implementation. |
| Experiment Setup | Yes | Appendix B Implementation Details: We use Adam optimizer with learning rates 3e-4 and 1e-4 for Q-functions and actors respectively. The batch size is 256. The networks for both Q-functions and actor are MLPs with two hidden layers of size 256 and ReLU activation. |