Hierarchical Diffusion for Offline Decision Making

Authors: Wenhao Li, Xiangfeng Wang, Bo Jin, Hongyuan Zha

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments This section aims to verify the effectiveness of HDMI in long-horizon goal-reaching, reward-maximizing, and realistic tasks. We emphasize in bold scores within 5 percent of the maximum per task (Kostrikov et al., 2022).
Researcher Affiliation Academia 1School of Data Science, The Chinese University of Hong Kong, Shenzhen, China. 2Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China. 3School of Computer Science and Technology, East China Normal University, Shanghai, China. 4School of Software Engineering, Tongji University, Shanghai, China. 5Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, China.
Pseudocode Yes Algorithm 1 Next Subgoal Searching on the Sub-graph.
Open Source Code Yes The code is available at https://anonymous.4open.science/r/HDMI/.
Open Datasets Yes To visually verify the advantage of HDMI on long-horizon decision-making tasks, we use the Maze2D and Ant Maze dataset (Fu et al., 2020).
Dataset Splits No The paper mentions using standard datasets like Maze2D, Ant Maze, and D4RL, but does not provide explicit train/validation/test split percentages, sample counts, or a detailed splitting methodology within its main text or appendices.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running the experiments are provided.
Software Dependencies No The paper mentions using 'scikit-learn' and borrowing from 'Di T (Peebles & Xie, 2022) repository' and 'DT (Chen et al., 2021)', but it does not specify version numbers for these software dependencies.
Experiment Setup Yes We train the goal diffuser ϵθg, trajectory diffuser ϵθs and inverse dynamics model FθI using the Adam optimizer with a learning rate of 2e 4 and batch size of 32 for 2e6 training steps. We use K = 100 diffusion steps for all diffusers. For different offline decision-making tasks, we use a planning horizon H of 20 in all the D4RL locomotion tasks, 20 in Maze2D and 50 in long-horizon Maze2D, which is much smaller than Diffuser (Janner et al., 2022) and DD (Ajay et al., 2022). We use a guidance scale s {1.2, 1.4, 1.6, 1.8} but the exact choice varies by task, and we choose context length C = 20, which is same as DD (Ajay et al., 2022).