Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient

Authors: Steven Li, Rickmer Krohn, Tao Chen, Anurag Ajay, Pulkit Agrawal, Georgia Chalvatzaki

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies validate DDiff PG s capability to master multimodal behaviors in complex, high-dimensional continuous control tasks with sparse rewards, also showcasing proof-of-concept dynamic online replanning when navigating mazes with unseen obstacles.
Researcher Affiliation Collaboration Zechu Li1,2 Rickmer Krohn1,3 Tao Chen2 Anurag Ajay2 Pulkit Agrawal2 Georgia Chalvatzaki1,3 1Technical University of Darmstadt 2Massachusetts Institute of Technology 3Hessian.AI
Pseudocode Yes Fig. 2 provides an overview of the proposed method, and the pseudocode is available in Alg. 1.
Open Source Code Yes Our project page is available at https://supersglzc.github.io/projects/ddiffpg/.
Open Datasets Yes The Ant Maze environments are implemented based on the D4RL benchmark [20]. The robotic manipulation environments with Franka are based on [22].
Dataset Splits No The paper mentions running experiments with five random seeds and evaluating on 20 episodes, but does not provide specific train/validation/test dataset splits.
Hardware Specification Yes We use NVIDIA Ge Force RTX 4090 for all experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes Hyperparameters are available in Tab. E.