DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
Authors: Guanghe Li, Yixiang Shan, Zhengbang Zhu, Ting Long, Weinan Zhang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of Diff Stitch across RL methodologies. Notably, Diff Stitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT). |
| Researcher Affiliation | Academia | 1Jilin University 2Shanghai Jiao Tong University. Correspondence to: Ting Long <longting@jlu.edu.cn>. |
| Pseudocode | Yes | We summarize the detailed pseudo code for Diff Stitch in Appendix A.1. Algorithm 1 Diff Stitch |
| Open Source Code | Yes | Our code is publicly available at https://github.com/ guangheli12/Diff Stitch. |
| Open Datasets | Yes | We evaluate Diff Stitch on a wide range of domains in the D4RL benchmark (Fu et al., 2020), including Mu Jo Co tasks and Adroit tasks. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test dataset splits with specific percentages or counts. |
| Hardware Specification | Yes | Our model is trained on a device with A5000 GPUs(24GB GPU memory, 27.8 TFLOPS computing capabilities, AMD EPYC 7371 16-Core Processor, optimized by Adam (Kingma & Ba, 2014) optimizer. |
| Software Dependencies | No | The paper mentions using CORL and author-provided codebases but does not list specific version numbers for general software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For the generative model, the denoising step K is set to 100. We implement the inverse dynamics model fψ with a 2-layer MLP, and the dynamic model fω, reward model fϕ with a 4-layer MLP. For the stitching, we set the qualification threshold used for data selection δ ranging from [1, 16] depending on the dataset. The diffusion model is trained for 1M or 0.5M steps depending on the task. The horizon H is set to 100 for D4RL locomotion tasks, 56 in D4RL kitchen and Adroit-pen tasks, {56, 100, 128, 256} in maze2d and antmaze tasks(Exact choice varies by maze size). We generate stitching transitions parallelly, with a batch size of 64. The data ratio is selected from {0 : 1, 1 : 1, 2 : 1, 4 : 1, 9 : 1, 1 : 0}. |