Generating Behaviorally Diverse Policies with Latent Diffusion Models
Authors: Shashank Hegde, Sumeet Batra, K.R. Zentner, Gaurav Sukhatme
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show evidence of the manifold hypothesis or the elite hypervolume [32], that all high performing policies lie on a low-dimensional manifold. We summarize our contributions below. 1. We compress an archive of policies parameterized by deep neural networks and trained via a state of the art QD-RL method PPGA into a single, expressive model while maintaining performance of the policies in the original dataset. 2. We use the iterative conditioning mechanism of diffusion models to reconstruct policies with precise locations in measure space, and demonstrate how language conditioning can be used to flexibly generate policies with different behaviors. 3. We showcase our model s ability to sequentially compose completely different behaviors together, and additionally show that language conditioning can be used to dramatically improve the performance and consistency of sequential behavior composition. |
| Researcher Affiliation | Academia | Shashank Hegde University of Southern California khegde@usc.edu Sumeet Batra University of Southern California ssbatra@usc.edu K.R. Zentner University of Southern California kzentner@usc.edu Gaurav S. Sukhatme University of Southern California gaurav@usc.edu |
| Pseudocode | No | The paper describes procedures and models using text and equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions a 'Project website: https://sites.google.com/view/policydiffusion/home' where 'rollout videos for the results shown in the paper can be found'. However, it does not explicitly state that the source code for the methodology is available at this link or elsewhere. |
| Open Datasets | Yes | Since PPGA was evaluated on the Brax [10] environments Humanoid, Walker2D, Halfcheetah, and Ant, we evaluate our model on the same four environments. |
| Dataset Splits | No | The paper describes training its models on policies sampled from an archive and how performance metrics are computed, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) for reproducibility. |
| Hardware Specification | Yes | Each VAE and diffusion experiment was run on a SLURM cluster where each job was allocated 6 cores of a Intel(R) Xeon(R) Gold 6154 3.00GHz CPU, a NVIDIA Ge Force RTX 2080 Ti GPU, and 108 GB of RAM. |
| Software Dependencies | No | The paper mentions software components like 'Flan-T5-Small encoder' and 'Brax', but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Table 3: Hyperparameters used to train the VAE (z dimension 64, KL coefficient 1e-6, Learning rate 1e-4, Training batch size 32, GHN hidden layer size 16). Table 4: Hyperparameters used to train the Latent Diffusion Model (No. of Resnet Blocks in U-Net 1, U-Net activation Si LU, Transformer heads in middle part of U-Net 4, Learning rate 1e-4, Training batch size 32). |