PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer
Authors: Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results suggest that Plan DQ can achieve superior or competitive performance on D4RL continuous control benchmark tasks as well as Ant Maze, Kitchen, and Calvin as long-horizon tasks. Through empirical evaluation, Plan DQ is shown to either outperform or match the performance of existing state-of-the-art methods across a variety of tasks, demonstrating its effectiveness in both longhorizon and short-horizon settings. |
| Researcher Affiliation | Academia | 1Rutgers University 2KAIST 3National University of Singapore 4EPFL. |
| Pseudocode | Yes | Algorithm 1 Plan DQ: D-Conductor Training Algorithm 2 Plan DQ: Q-Performer Training Algorithm 3 Planning with Plan DQ |
| Open Source Code | No | The paper mentions building upon officially released code from other projects ("We build our D-Conductor upon the officially released Diffuser code obtained from https://github.com/jannerm/diffuser. We build our Q-Performer with the official code obtained from https://github.com/Zhendong-Wang/Diffusion-Policiesfor-Offline-RL") but does not explicitly state that the code for their own method, Plan DQ, is open-source or provide a link to it. |
| Open Datasets | Yes | Our experiment section first presents our main results on the standard D4RL (Fu et al., 2020) benchmarks. We also include tasks with extended horizons, specifically designed to assess the long-horizon reasoning capabilities of the model. Following this, we conduct an analysis using a simplified Open Maze2D environment, offering insights into the reasons behind the superior performance of the Q-learningbased methods compared to value-guided sequence modeling approaches. We end our experiment section with a thorough analysis of our proposed method. The Ant Maze suite, known for its challenging long-horizon navigation tasks, is a task where the goal is to control an 8-Do F Ant to reach a pre-set goal from its initial position. Beyond the standard levels of Ant Maze included in the D4RL benchmark, our evaluation extends to Ant Maze-Ultra (zhengyao jiang et al., 2023), which introduces a larger maze environment. The Kitchen from D4RL and Calvin (Mees et al., 2022) are two long-horizon manipulation tasks. |
| Dataset Splits | No | The paper refers to using an "offline dataset D" and discusses training, but it does not provide specific details on how the dataset is split into training, validation, and test sets with percentages, sample counts, or references to predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments. It only refers to "standard D4RL continuous control benchmark tasks" and "Gym-Mu Jo Co suite" which are environments, not hardware. |
| Software Dependencies | No | The paper mentions building upon existing codebases for Diffuser and Diffusion-Policies-for-Offline-RL but does not list specific software dependencies with version numbers (e.g., "PyTorch 1.9", "Python 3.8") that are required to reproduce the experiments. |
| Experiment Setup | Yes | We set K = 30 for the long-horizon planning tasks, while for the Gym-Mu Jo Co and Kitchen, we use K = 4. The planning horizon is set to H K = 270 for Ant Maze-Medium, H K = 450 for Ant Maze-Large, and H K = 720 for Ant Maze-Ultra. For short-horizon Gym-Mu Jo Co, we use H K = 32. For Calvin, the planning horizon is H K = 360. And in Kitchen we use H K = 64. For the Mu Jo Co locomotion and Kitchen tasks, we select the guidance scales ω from a set of choices, {0.1, 0.01, 0.001, 0.0001}, during the planning phase. |