Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diffusion Model-Augmented Behavioral Cloning

Authors: Shang-Fu Chen, Hsiang-Chun Wang, Ming-Hao Hsu, Chun-Mao Lai, Shao-Hua Sun

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental DBC outperforms baselines in various continuous control tasks in navigation, robot arm manipulation, dexterous manipulation, and locomotion. We design additional experiments to verify the limitations of modeling either the conditional probability or the joint probability of the expert distribution, as well as compare different generative models. Ablation studies justify the effectiveness of our design choices.
Researcher Affiliation Academia 1National Taiwan University, Taipei, Taiwan.
Pseudocode Yes Algorithm 1 Diffusion Model-Augmented Behavioral Cloning (DBC)
Open Source Code No The paper does not include an explicit statement or link indicating that the authors' source code for their methodology is publicly available.
Open Datasets Yes To evaluate our method on a navigation task, we choose MAZE, a maze environment proposed in Fu et al. (2020) (maze2d-medium-v2)... We use the demonstrations, consisting of 10k transitions (303 trajectories), provided by Lee et al. (2021) for these tasks... We use the demonstrations provided by Kostrikov (2018), which contains 5 trajectories with 5k state-action pairs for both the CHEETAH and WALKER environments.
Dataset Splits No The paper does not explicitly specify validation dataset splits (e.g., percentages, counts, or references to predefined validation sets) for reproducibility.
Hardware Specification Yes M1: ASUS WS880T workstation with an Intel Xeon W-2255 (10C/20T, 19.25M, 4.5GHz) 48-Lane CPU, 64GB memory, an NVIDIA RTX 3080 Ti GPU, and an NVIDIA RTX 3090 Ti GPU
Software Dependencies No The paper mentions using the 'Adam optimizer (Kingma & Ba, 2015)' and 'DDPMs (J Ho, 2020)' but does not provide specific version numbers for these or other software libraries (e.g., Python, PyTorch).
Experiment Setup Yes We report the hyperparameters used for all the methods on all the tasks in Table 10. We use the Adam optimizer (Kingma & Ba, 2015) for all the methods on all the tasks and use linear learning rate decay for all policy models.