DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

Authors: Yilun Xu, Gabriele Corso, Tommi Jaakkola, Arash Vahdat, Karsten Kreis

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate Dis Co Diff on toy data, several image synthesis tasks as well as molecular docking, and find that introducing discrete latents consistently improves model performance. For example, Dis Co-Diff achieves state-of-the-art FID scores on class-conditioned Image Net-64/128 datasets with ODE sampler.
Researcher Affiliation Collaboration 1NVIDIA 2MIT. Correspondence to: Yilun Xu <yilunx@nvidia.com>.
Pseudocode Yes We provide the algorithm pseudocode for training and sampling in Appendix C.
Open Source Code Yes Please see the source code in the Supplementary Material for all low-level details.
Open Datasets Yes We use the Image Net (Deng et al., 2009) dataset and tackle both class-conditional (at varying resolutions 64 64 and 128 128) and unconditional synthesis.
Dataset Splits Yes Data for training and evaluation comes from the PDBBind dataset (Liu et al., 2017) with time-based splits (complexes before 2019 for training and validation, selected complexes from 2019 for testing).
Hardware Specification Yes on a single NVIDIA A100 GPU.
Software Dependencies No No specific version numbers for key software components like Python, PyTorch, CUDA, RDKit, or e3nn were provided. Only software names were mentioned without versions.
Experiment Setup Yes We set the latent dimension to m = 10 and the codebook size to k = 100 in Dis Co-Diff. We use Heun s second-order method as ODE sampler, and a 12-layer Transformer as the auto-regressive model.