When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning
Authors: Jianxiong Li, Xianyuan Zhan, Haoran Xu, Xiangyu Zhu, Jingjing Liu, Ya-Qin Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations on D4RL benchmarks validate that our algorithm enjoys better performance and generalization abilities than state-of-the-art offline RL methods. and Table 1 shows that DOGE achieves comparable or better performance than SOTA methods on most Mujoco and Ant Maze tasks. |
| Researcher Affiliation | Collaboration | Jianxiong Li1, Xianyuan Zhan1,2 , Haoran Xu1, Xiangyu Zhu1, Jingjing Liu1 & Ya-Qin Zhang1 1 Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China 2 Shanghai Artificial Intelligence Laboratory, Shanghai, China li-jx21@mails.tsinghua.edu.cn, zhanxianyuan@air.tsinghua.edu.cn and This work is also supported by Baidu Inc. through Apollo-AIR Joint Research Center. |
| Pseudocode | Yes | Algorithm 1 Our implementation for DOGE on page 14. |
| Open Source Code | Yes | Code is available at https://github.com/Facebear-ljx/DOGE. |
| Open Datasets | Yes | For evaluation, We compare DOGE and prior offline RL methods over D4RL Mujoco and Ant Maze tasks (Fu et al., 2020). |
| Dataset Splits | No | The paper mentions evaluating methods on 'D4RL Mujoco and Ant Maze tasks' and refers to existing implementations for other methods, implying standard benchmark usage. However, it does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages or sample counts) within the paper itself. |
| Hardware Specification | Yes | We can perform 1M training steps on one GTX 3080Ti GPU in less than 50min for Mujoco tasks and 1h 40min for Ant Maze tasks. |
| Software Dependencies | No | The paper mentions using 'Re LU activated MLPs', 'Adam optimizer', and building upon 'TD3' and 'SAC', but it does not specify version numbers for any software dependencies like Python libraries (e.g., PyTorch, TensorFlow, or scikit-learn) or other specific software. |
| Experiment Setup | Yes | The hyperparameters of DOGE are listed in Table 2, including: Batch size 256, Layers 3, Hidden dim 256, Actor learning rate 3e-4, Critic learning rate (3e-4 for Mujoco, 1e-3 for Ant Maze), Discount factor γ (0.99 for Mujoco, 0.995 for Ant Maze), Number of iterations 10^6, Target update rate τ 0.005, Policy noise 0.2, Policy noise clipping 0.5, Policy update frequency 2, λ clipped to [1, 100], and λ learning rate 3e-4. |