Imitating Human Behaviour with Diffusion Models

Authors: Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam Devlin

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, diffusion models closely match human demonstrations in a simulated robotic control task and a modern 3D gaming environment.
Researcher Affiliation Industry Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam Devlin Microsoft Research
Pseudocode Yes Appendix D: SAMPLING ALGORITHMS includes 'Algorithm 1 Sampling for Diffusion BC', 'Algorithm 2 Sampling for Diffusion-X', and 'Algorithm 3 Sampling for Diffusion-KDE'.
Open Source Code Yes Code: https://github.com/microsoft/Imitating-Human-Behaviour-w-Diffusion.
Open Datasets No No, the paper describes the datasets used ('The demonstration dataset contains 566 trajectories' for Kitchen, 'The demonstration dataset contains 45,000 observation/action tuples' for CSGO) and cites their origin (e.g., 'Gupta et al., 2020' for Kitchen, 'Pearce and Zhu, 2022' for CSGO environment), but does not provide specific URLs, DOIs, repository names, or explicit statements within this paper confirming public availability of these specific datasets for download.
Dataset Splits No No, the paper refers to 'demonstration dataset' and evaluates models by 'rollouts' (e.g., 'roll out 100 trajectories of length 280 for evaluation'), but does not specify explicit training, validation, and test dataset splits with percentages, counts, or references to predefined splits.
Hardware Specification Yes we were able to roll out our diffusion models at 8Hz on an average gaming GPU (NVIDIA GTX 1060 Mobile). ... Kitchen environment, V100 GPU ... CSGO environment, Res Net18 observation encoder, V100 GPU
Software Dependencies No No, the paper mentions software components such as 'scikit-learn' and 'torch.nograd()', but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes MLP models used a learning rate of 1e-3 and batchsize of 512, while transformer models used a learning rate of 5e-4 and batchsize of 1024. We set K = 64 for K-means and discretised used 20 bins per action dimension. For diffusion models, we set T = 50 and standard β schedules linearly decaying in [1e-4, 0.02].