Text-Aware Diffusion for Policy Learning
Authors: Calvin Luo, Mandy He, Zilai Zeng, Chen Sun
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we demonstrate that TADPo Le is able to learn policies for novel goal-achievement and continuous locomotion behaviors specified by natural language, in both Humanoid and Dog environments. The behaviors are learned zero-shot without ground-truth rewards or expert demonstrations, and are qualitatively more natural according to human evaluation. |
| Researcher Affiliation | Academia | Calvin Luo Mandy He Zilai Zeng Chen Sun Brown University {calvin_luo,mandy_he,zilai_zeng,chensun}@brown.edu |
| Pseudocode | Yes | A pseudocode of the method is provided in Algorithm 1. |
| Open Source Code | Yes | Visualizations and code are provided at diffusion-supervision.github.io/tadpole/. |
| Open Datasets | Yes | We present our main results using the Dog and Humanoid environments from the Deep Mind Control Suite [39], and robotic manipulation tasks from Meta-World [42]. |
| Dataset Splits | No | The paper mentions training steps and evaluation rollouts but does not explicitly detail training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | All experiments are performed on a single NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions software like Stable Diffusion 2.1, Animated Diff v2, TD-MPC, CLIP, and GPT-3.5 but does not provide specific version numbers for programming languages or libraries (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | We use TD-MPC [14] as the reinforcement learning algorithm for all tasks... We train Humanoid and Dog agents for 2M steps, and Meta-World agents for 700K steps... We fix the reward weights w1 = 2000 and w2 = 200 based on Humanoid standing and walking performance, and study their impact in Appendix B.3. Selection of noise level is discussed in Appendix A. |