reproducibilityindex.ai

Learning from Sparse Offline Datasets via Conservative Density Estimation

Authors: Zhepeng Cen, Zuxin Liu, Zitong Wang, Yihang Yao, Henry Lam, Ding Zhao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we aim to study if CDE can truly combine the advantages of both pessimism-based methods and the DICE-based approaches. We are particularly interested in two main questions: (1) Does CDE incorporate the strengths of the stationary-distribution correction training framework when handling sparse reward settings? (2) Can CDE s explicit density constraint effectively manage out-of-distribution (OOD) extrapolation issues in situations with insufficient datasets? Tasks. To answer these questions, we adopt 3 Maze2D datasets, 8 Adroit datasets, and 6 Mu Jo Co (medium, medium-expert) datasets from the D4RL benchmark (Fu et al., 2020).
Researcher Affiliation	Academia	1Carnegie Mellon University, 2 Columbia University
Pseudocode	Yes	Algorithm 1 Conservative Density Estimation
Open Source Code	Yes	Code is available at https://github.com/czp16/cde-offline-rl.
Open Datasets	Yes	We adopt 3 Maze2D datasets, 8 Adroit datasets, and 6 Mu Jo Co (medium, medium-expert) datasets from the D4RL benchmark (Fu et al., 2020).
Dataset Splits	No	The paper describes using sub-datasets for comparative experiments and evaluation processes, but it does not explicitly provide details about standard train/validation/test dataset splits or a validation set.
Hardware Specification	Yes	We use the server with AMD EPYC 7542 32-Core CPU and A5000 GPU.
Software Dependencies	No	The paper mentions using 'Adam' optimizer and the 'd3rlpy library' for baselines, but it does not specify version numbers for these or other software components like Python or PyTorch, which would be needed for reproducibility.
Experiment Setup	Yes	Before training NN, we standardize the observation and reward and scale the reward by multiplying 0.1. [...] Table 5: The shared hyperparameters. Hyperparameters values hidden layers of policy πθ [256,256] [...] NN learning rate 3e-4 discount factor γ 0.99 batch size 512 mixture coefficient ζ 0.9 max OOD IS ratio ϵ 0.3 number of OOD action samples 5