Imitating Human Behaviour with Diffusion Models
Authors: Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam Devlin
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, diffusion models closely match human demonstrations in a simulated robotic control task and a modern 3D gaming environment. |
| Researcher Affiliation | Industry | Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam Devlin Microsoft Research |
| Pseudocode | Yes | Appendix D: SAMPLING ALGORITHMS includes 'Algorithm 1 Sampling for Diffusion BC', 'Algorithm 2 Sampling for Diffusion-X', and 'Algorithm 3 Sampling for Diffusion-KDE'. |
| Open Source Code | Yes | Code: https://github.com/microsoft/Imitating-Human-Behaviour-w-Diffusion. |
| Open Datasets | No | No, the paper describes the datasets used ('The demonstration dataset contains 566 trajectories' for Kitchen, 'The demonstration dataset contains 45,000 observation/action tuples' for CSGO) and cites their origin (e.g., 'Gupta et al., 2020' for Kitchen, 'Pearce and Zhu, 2022' for CSGO environment), but does not provide specific URLs, DOIs, repository names, or explicit statements within this paper confirming public availability of these specific datasets for download. |
| Dataset Splits | No | No, the paper refers to 'demonstration dataset' and evaluates models by 'rollouts' (e.g., 'roll out 100 trajectories of length 280 for evaluation'), but does not specify explicit training, validation, and test dataset splits with percentages, counts, or references to predefined splits. |
| Hardware Specification | Yes | we were able to roll out our diffusion models at 8Hz on an average gaming GPU (NVIDIA GTX 1060 Mobile). ... Kitchen environment, V100 GPU ... CSGO environment, Res Net18 observation encoder, V100 GPU |
| Software Dependencies | No | No, the paper mentions software components such as 'scikit-learn' and 'torch.nograd()', but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | MLP models used a learning rate of 1e-3 and batchsize of 512, while transformer models used a learning rate of 5e-4 and batchsize of 1024. We set K = 64 for K-means and discretised used 20 bins per action dimension. For diffusion models, we set T = 50 and standard β schedules linearly decaying in [1e-4, 0.02]. |