reproducibilityindex.ai

DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching

Authors: Guanghe Li, Yixiang Shan, Zhengbang Zhu, Ting Long, Weinan Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of Diff Stitch across RL methodologies. Notably, Diff Stitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).
Researcher Affiliation	Academia	1Jilin University 2Shanghai Jiao Tong University. Correspondence to: Ting Long <longting@jlu.edu.cn>.
Pseudocode	Yes	We summarize the detailed pseudo code for Diff Stitch in Appendix A.1. Algorithm 1 Diff Stitch
Open Source Code	Yes	Our code is publicly available at https://github.com/ guangheli12/Diff Stitch.
Open Datasets	Yes	We evaluate Diff Stitch on a wide range of domains in the D4RL benchmark (Fu et al., 2020), including Mu Jo Co tasks and Adroit tasks.
Dataset Splits	No	The paper does not explicitly provide details about training, validation, and test dataset splits with specific percentages or counts.
Hardware Specification	Yes	Our model is trained on a device with A5000 GPUs(24GB GPU memory, 27.8 TFLOPS computing capabilities, AMD EPYC 7371 16-Core Processor, optimized by Adam (Kingma & Ba, 2014) optimizer.
Software Dependencies	No	The paper mentions using CORL and author-provided codebases but does not list specific version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For the generative model, the denoising step K is set to 100. We implement the inverse dynamics model fψ with a 2-layer MLP, and the dynamic model fω, reward model fϕ with a 4-layer MLP. For the stitching, we set the qualification threshold used for data selection δ ranging from [1, 16] depending on the dataset. The diffusion model is trained for 1M or 0.5M steps depending on the task. The horizon H is set to 100 for D4RL locomotion tasks, 56 in D4RL kitchen and Adroit-pen tasks, {56, 100, 128, 256} in maze2d and antmaze tasks(Exact choice varies by maze size). We generate stitching transitions parallelly, with a batch size of 64. The data ratio is selected from {0 : 1, 1 : 1, 2 : 1, 4 : 1, 9 : 1, 1 : 0}.