Simple Hierarchical Planning with Diffusion

Authors: Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted empirical evaluations on standard offline reinforcement learning benchmarks, demonstrating our method s superior performance and efficiency in terms of training and planning speed compared to the non-hierarchical Diffuser as well as other hierarchical planning methods.
Researcher Affiliation Collaboration Chang Chen1, Fei Deng1, Kenji Kawaguchi2, Caglar Gulcehre3,4 , Sungjin Ahn5 1 Rutgers University, 2 National University of Singapore, 3 EPFL 4 Google Deep Mind, 5 KAIST
Pseudocode Yes Algorithm 1 High-Level Planning
Open Source Code No The paper does not explicitly state that the source code for their proposed method (Hierarchical Diffuser) is publicly available. It only mentions building upon the officially released Diffuser code from a third-party GitHub repository.
Open Datasets Yes We start with our main results on the D4RL (Fu et al., 2020) benchmark.
Dataset Splits Yes Throughout the training phase, we partitioned 10% of the training dataset as a validation set to mitigate the risk of overfitting.
Hardware Specification Yes All models were measured using a single NVIDIA RTX 8000 GPU to ensure consistency.
Software Dependencies No The paper states: 'We build our Hierarchical Diffuser upon the officially released Diffuser code obtained from https://github.com/jannerm/diffuser.' However, it does not specify software dependencies like Python version, PyTorch/TensorFlow versions, or other library versions with specific numbers needed for replication.
Experiment Setup Yes In this section, we describe the details of implementation and hyperparameters we used during our experiments. For the Out-of-distribution experiment details, please check Section E. We set K = 15 for the long-horizon planning tasks, while for the Gym-Mu Jo Co, we use K = 4. Aligning closely with the settings used by Diffuser, we employ a planning horizon of H = 32 for the Mu Jo Co locomotion tasks. For the Maze2D tasks, we utilize varying planning horizons; H = 120 for the Maze2D UMaze task, H = 255 for the Medium Maze task, and H = 390 for the Large Maze task. For the Ant Maze tasks, we set H = 225 for the UMaze, H = 255 for the Medium Maze, and H = 450 for the Large Maze. For the Mu Jo Co locomotion tasks, we select the guidance scales ω from a set of choices, {0.1, 0.01, 0.001, 0.0001}, during the planning phase.