DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching

Authors: Guanghe Li, Yixiang Shan, Zhengbang Zhu, Ting Long, Weinan Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of Diff Stitch across RL methodologies. Notably, Diff Stitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).
Researcher Affiliation Academia 1Jilin University 2Shanghai Jiao Tong University. Correspondence to: Ting Long <longting@jlu.edu.cn>.
Pseudocode Yes We summarize the detailed pseudo code for Diff Stitch in Appendix A.1. Algorithm 1 Diff Stitch
Open Source Code Yes Our code is publicly available at https://github.com/ guangheli12/Diff Stitch.
Open Datasets Yes We evaluate Diff Stitch on a wide range of domains in the D4RL benchmark (Fu et al., 2020), including Mu Jo Co tasks and Adroit tasks.
Dataset Splits No The paper does not explicitly provide details about training, validation, and test dataset splits with specific percentages or counts.
Hardware Specification Yes Our model is trained on a device with A5000 GPUs(24GB GPU memory, 27.8 TFLOPS computing capabilities, AMD EPYC 7371 16-Core Processor, optimized by Adam (Kingma & Ba, 2014) optimizer.
Software Dependencies No The paper mentions using CORL and author-provided codebases but does not list specific version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes For the generative model, the denoising step K is set to 100. We implement the inverse dynamics model fψ with a 2-layer MLP, and the dynamic model fω, reward model fϕ with a 4-layer MLP. For the stitching, we set the qualification threshold used for data selection δ ranging from [1, 16] depending on the dataset. The diffusion model is trained for 1M or 0.5M steps depending on the task. The horizon H is set to 100 for D4RL locomotion tasks, 56 in D4RL kitchen and Adroit-pen tasks, {56, 100, 128, 256} in maze2d and antmaze tasks(Exact choice varies by maze size). We generate stitching transitions parallelly, with a batch size of 64. The data ratio is selected from {0 : 1, 1 : 1, 2 : 1, 4 : 1, 9 : 1, 1 : 0}.