PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

Authors: Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results suggest that Plan DQ can achieve superior or competitive performance on D4RL continuous control benchmark tasks as well as Ant Maze, Kitchen, and Calvin as long-horizon tasks. Through empirical evaluation, Plan DQ is shown to either outperform or match the performance of existing state-of-the-art methods across a variety of tasks, demonstrating its effectiveness in both longhorizon and short-horizon settings.
Researcher Affiliation Academia 1Rutgers University 2KAIST 3National University of Singapore 4EPFL.
Pseudocode Yes Algorithm 1 Plan DQ: D-Conductor Training Algorithm 2 Plan DQ: Q-Performer Training Algorithm 3 Planning with Plan DQ
Open Source Code No The paper mentions building upon officially released code from other projects ("We build our D-Conductor upon the officially released Diffuser code obtained from https://github.com/jannerm/diffuser. We build our Q-Performer with the official code obtained from https://github.com/Zhendong-Wang/Diffusion-Policiesfor-Offline-RL") but does not explicitly state that the code for their own method, Plan DQ, is open-source or provide a link to it.
Open Datasets Yes Our experiment section first presents our main results on the standard D4RL (Fu et al., 2020) benchmarks. We also include tasks with extended horizons, specifically designed to assess the long-horizon reasoning capabilities of the model. Following this, we conduct an analysis using a simplified Open Maze2D environment, offering insights into the reasons behind the superior performance of the Q-learningbased methods compared to value-guided sequence modeling approaches. We end our experiment section with a thorough analysis of our proposed method. The Ant Maze suite, known for its challenging long-horizon navigation tasks, is a task where the goal is to control an 8-Do F Ant to reach a pre-set goal from its initial position. Beyond the standard levels of Ant Maze included in the D4RL benchmark, our evaluation extends to Ant Maze-Ultra (zhengyao jiang et al., 2023), which introduces a larger maze environment. The Kitchen from D4RL and Calvin (Mees et al., 2022) are two long-horizon manipulation tasks.
Dataset Splits No The paper refers to using an "offline dataset D" and discusses training, but it does not provide specific details on how the dataset is split into training, validation, and test sets with percentages, sample counts, or references to predefined splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments. It only refers to "standard D4RL continuous control benchmark tasks" and "Gym-Mu Jo Co suite" which are environments, not hardware.
Software Dependencies No The paper mentions building upon existing codebases for Diffuser and Diffusion-Policies-for-Offline-RL but does not list specific software dependencies with version numbers (e.g., "PyTorch 1.9", "Python 3.8") that are required to reproduce the experiments.
Experiment Setup Yes We set K = 30 for the long-horizon planning tasks, while for the Gym-Mu Jo Co and Kitchen, we use K = 4. The planning horizon is set to H K = 270 for Ant Maze-Medium, H K = 450 for Ant Maze-Large, and H K = 720 for Ant Maze-Ultra. For short-horizon Gym-Mu Jo Co, we use H K = 32. For Calvin, the planning horizon is H K = 360. And in Kitchen we use H K = 64. For the Mu Jo Co locomotion and Kitchen tasks, we select the guidance scales ω from a set of choices, {0.1, 0.01, 0.001, 0.0001}, during the planning phase.