Is Conditional Generative Modeling all you need for Decision Making?
Authors: Anurag Ajay, Yilun Du, Abhi Gupta, Joshua B. Tenenbaum, Tommi S. Jaakkola, Pulkit Agrawal
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS In this section, we explore the efficacy of the Decision Diffuser on a variety of decision-making tasks (performance illustrated in Figure 4). In particular, we evaluate (1) the ability to recover effective RL policies from offline data, (2) the ability to generate behavior that satisfies multiple sets of constraints, (3) the ability compose multiple different skills together. |
| Researcher Affiliation | Academia | Improbable AI Lab Operations Research Center Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Conditional Planning with the Decision Diffuser |
| Open Source Code | No | The paper provides a link to a website (https://anuragajay.github.io/decision-diffuser/) for videos and visualizations, but it does not explicitly state that the source code for the methodology is openly available or provide a repository link. |
| Open Datasets | Yes | To test this, we train a state diffusion process and inverse dynamics model on publicly available D4RL datasets (Fu et al., 2020). |
| Dataset Splits | No | The paper mentions training on 'publicly available D4RL datasets' and collecting its own datasets (e.g., for Kuka Block Stacking and Unitree-go-running), but it does not explicitly state specific training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper mentions in the acknowledgements: 'We thank MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing compute resources.' However, it does not provide specific hardware details such as exact GPU or CPU models, memory, or cloud instance types used for experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer', 'group norm (Wu & He, 2018)', and 'Mish nonlinearity (Misra, 2019)', and references code for 'temporal U-Net' from a GitHub link, but it does not specify version numbers for any software libraries or frameworks used. |
| Experiment Setup | Yes | We train ϵθ and fϕ using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 2e 4 and batch size of 32 for 2e6 train steps. We choose the probability p of removing the conditioning information to be 0.25. We use K = 100 diffusion steps. We use a planning horizon H of 100 in all the D4RL locomotion tasks, 56 in D4RL kitchen tasks, 128 in Kuka block stacking, 56 in unitree-go-running tasks, 50 in the illustrative example and 60 in Block push tasks. We use a guidance scale s {1.2, 1.4, 1.6, 1.8} but the exact choice varies by task. We choose α = 0.5 for low temperature sampling. We choose context length C = 20. |