DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation

Authors: Jinxin Liu, Xinghong Guo, Zifeng Zhuang, Donglin Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results in four decision-making domains (Push, Kitchen, Humanoid, and D4RL tasks) show that DIDI is effective in discovering diverse and discriminative skills. We empirically evaluate our DIDI approach in four decision-making domains: Push, Kitchen, Humanoid, and D4RL tasks. Across various action/observation spaces, our results demonstrate that DIDI successfully discovers diverse and discriminative skills.
Researcher Affiliation Academia 1School of Engineering, Westlake University, Hangzhou, China 2Zhejiang University, Hangzhou, China 3Institute of Advanced Technology, Westlake Institute for Advanced Study, Hangzhou, China.
Pseudocode Yes Algorithm 1 Diffusion-Guided Diversity (DIDI) Require: offline dataset πD(τt) and skill distribution p(z). Initialize diffusion prior πψ, reward network R, skill discriminator qϕ(z|τ n t ) and contextual policy πθ(τ n t |st, z). 1: Train the diffusion prior πψ with Equation 8. 2: while not converged do 3: Sample z p(z), st πD(τt), and n [0, N]. 4: Learn qϕ(z|τ n t ) and πθ(τ n t |st, z) with JDIDI. 5: end while Return: contextual policy at πθ(τ n t |st, z).
Open Source Code Yes The code for our implementation is available at https://github.com/huey0528/icml24didi.
Open Datasets Yes To answer the above questions, we validate our DIDI in four decision-making domains: Push, Kitchen, Humanoid (as shown in Figure 2), and D4RL (Fu et al., 2020) tasks. The Push task, derived from IBC (Florence et al., 2022)... The Kitchen task (Gupta et al., 2019)... The Humanoid task inherits from PHC (Luo et al., 2023)... The D4RL task, as introduced by Fu et al. (2020), provides a comprehensive suite of benchmark environments designed for offline RL.
Dataset Splits No The paper does not provide explicit training, validation, and test split percentages or sample counts for the datasets used. It refers to standard datasets like D4RL but does not detail the splits used for its specific experiments.
Hardware Specification No The paper does not explicitly provide any details about the specific hardware used for running the experiments (e.g., GPU models, CPU types, memory specifications).
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup No The paper describes the general experimental procedures (e.g., fine-tuning), but it does not specify concrete hyperparameters (like learning rate, batch size, number of epochs) or other system-level training settings needed for reproducibility. There is no section detailing the experimental setup with these specific values.