Simple Hierarchical Planning with Diffusion
Authors: Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted empirical evaluations on standard offline reinforcement learning benchmarks, demonstrating our method s superior performance and efficiency in terms of training and planning speed compared to the non-hierarchical Diffuser as well as other hierarchical planning methods. |
| Researcher Affiliation | Collaboration | Chang Chen1, Fei Deng1, Kenji Kawaguchi2, Caglar Gulcehre3,4 , Sungjin Ahn5 1 Rutgers University, 2 National University of Singapore, 3 EPFL 4 Google Deep Mind, 5 KAIST |
| Pseudocode | Yes | Algorithm 1 High-Level Planning |
| Open Source Code | No | The paper does not explicitly state that the source code for their proposed method (Hierarchical Diffuser) is publicly available. It only mentions building upon the officially released Diffuser code from a third-party GitHub repository. |
| Open Datasets | Yes | We start with our main results on the D4RL (Fu et al., 2020) benchmark. |
| Dataset Splits | Yes | Throughout the training phase, we partitioned 10% of the training dataset as a validation set to mitigate the risk of overfitting. |
| Hardware Specification | Yes | All models were measured using a single NVIDIA RTX 8000 GPU to ensure consistency. |
| Software Dependencies | No | The paper states: 'We build our Hierarchical Diffuser upon the officially released Diffuser code obtained from https://github.com/jannerm/diffuser.' However, it does not specify software dependencies like Python version, PyTorch/TensorFlow versions, or other library versions with specific numbers needed for replication. |
| Experiment Setup | Yes | In this section, we describe the details of implementation and hyperparameters we used during our experiments. For the Out-of-distribution experiment details, please check Section E. We set K = 15 for the long-horizon planning tasks, while for the Gym-Mu Jo Co, we use K = 4. Aligning closely with the settings used by Diffuser, we employ a planning horizon of H = 32 for the Mu Jo Co locomotion tasks. For the Maze2D tasks, we utilize varying planning horizons; H = 120 for the Maze2D UMaze task, H = 255 for the Medium Maze task, and H = 390 for the Large Maze task. For the Ant Maze tasks, we set H = 225 for the UMaze, H = 255 for the Medium Maze, and H = 450 for the Large Maze. For the Mu Jo Co locomotion tasks, we select the guidance scales ω from a set of choices, {0.1, 0.01, 0.001, 0.0001}, during the planning phase. |