Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Authors: Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed method on D4RL benchmarks [9] and verify that the proposed method outperforms the previous state-of-the-art by a large margin on various types of environments and datasets.
Researcher Affiliation Collaboration Seoul National University1 Neural Processing Research Center2 Deep Metrics3
Pseudocode Yes Algorithm 1 Ensemble-Diversified Actor Critic (EDAC)
Open Source Code Yes The code is available online3. 3https://github.com/snu-mllab/EDAC
Open Datasets Yes We evaluate our proposed methods against the previous offline RL algorithms on the standard D4RL benchmark [9].
Dataset Splits No The paper uses standard D4RL benchmarks but does not explicitly provide the specific train/validation/test dataset splits (e.g., percentages or sample counts) used for their experiments.
Hardware Specification Yes We run our experiments on a single machine with one RTX 3090 GPU
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries used in the implementation.
Experiment Setup Yes Appendix B Implementation Details: We use Adam optimizer with learning rates 3e-4 and 1e-4 for Q-functions and actors respectively. The batch size is 256. The networks for both Q-functions and actor are MLPs with two hidden layers of size 256 and ReLU activation.