reproducibilityindex.ai

When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning

Authors: Jianxiong Li, Xianyuan Zhan, Haoran Xu, Xiangyu Zhu, Jingjing Liu, Ya-Qin Zhang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations on D4RL benchmarks validate that our algorithm enjoys better performance and generalization abilities than state-of-the-art offline RL methods. and Table 1 shows that DOGE achieves comparable or better performance than SOTA methods on most Mujoco and Ant Maze tasks.
Researcher Affiliation	Collaboration	Jianxiong Li1, Xianyuan Zhan1,2 , Haoran Xu1, Xiangyu Zhu1, Jingjing Liu1 & Ya-Qin Zhang1 1 Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China 2 Shanghai Artificial Intelligence Laboratory, Shanghai, China li-jx21@mails.tsinghua.edu.cn, zhanxianyuan@air.tsinghua.edu.cn and This work is also supported by Baidu Inc. through Apollo-AIR Joint Research Center.
Pseudocode	Yes	Algorithm 1 Our implementation for DOGE on page 14.
Open Source Code	Yes	Code is available at https://github.com/Facebear-ljx/DOGE.
Open Datasets	Yes	For evaluation, We compare DOGE and prior offline RL methods over D4RL Mujoco and Ant Maze tasks (Fu et al., 2020).
Dataset Splits	No	The paper mentions evaluating methods on 'D4RL Mujoco and Ant Maze tasks' and refers to existing implementations for other methods, implying standard benchmark usage. However, it does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages or sample counts) within the paper itself.
Hardware Specification	Yes	We can perform 1M training steps on one GTX 3080Ti GPU in less than 50min for Mujoco tasks and 1h 40min for Ant Maze tasks.
Software Dependencies	No	The paper mentions using 'Re LU activated MLPs', 'Adam optimizer', and building upon 'TD3' and 'SAC', but it does not specify version numbers for any software dependencies like Python libraries (e.g., PyTorch, TensorFlow, or scikit-learn) or other specific software.
Experiment Setup	Yes	The hyperparameters of DOGE are listed in Table 2, including: Batch size 256, Layers 3, Hidden dim 256, Actor learning rate 3e-4, Critic learning rate (3e-4 for Mujoco, 1e-3 for Ant Maze), Discount factor γ (0.99 for Mujoco, 0.995 for Ant Maze), Number of iterations 10^6, Target update rate τ 0.005, Policy noise 0.2, Policy noise clipping 0.5, Policy update frequency 2, λ clipped to [1, 100], and λ learning rate 3e-4.