Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

Authors: Guangyi Liu, Yu Wang, Zeyu Feng, Qiyu Wu, Liping Tang, Yuan Gao, Zhen Li, Shuguang Cui, Julian Mcauley, Zichao Yang, Eric P. Xing, Zhiting Hu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks and the strong improvement over various existing models.
Researcher Affiliation Academia 1MBZUAI 2UC San Diego 3University of Tokyo 4Stanford University 5CUHK-Shenzhen 6CMU.
Pseudocode Yes B. Algorithm Below shows the complete training algorithm of EDDPMs. Algorithm 1 Training
Open Source Code Yes Code is available at https: //github.com/guangyliu/EDDPM
Open Datasets Yes Dataset Regarding our dataset selection, we commence with the bookcorpus dataset (Zhu et al., 2015) to train the autoencoder in the absence of the diffusion model. Subsequently, we engage in joint training of the model with diffusion, utilizing the Yelp review dataset3 (Shen et al., 2017), which has been preprocessed by Li et al. (2018). ...Dataset Following the approach of Diff AE, we train our model and subsequently evaluate its reconstruction and generation capabilities on FFHQ (Karras et al., 2019), Celeb A (Karras et al., 2018), LSUN-Bedroom, and LSUN-Horses (Yu et al., 2015). ...The models are trained and evaluaed on the Gifford Liu et al. (2019) dataset and the GFP(Sarkisyan et al., 2016) dataset.
Dataset Splits Yes The resulting dataset consists of 57603 sequences in the training set, 10166 sequences in the validation set, and 22690 sequences in the test set.
Hardware Specification Yes We trained our model on two Nvidia A100-SXM4-40GB GPUs with a batch size of 100.
Software Dependencies No The paper mentions models like BERT-small and GPT2-xl, and frameworks like UNet, but does not provide specific version numbers for software dependencies such as PyTorch, TensorFlow, or other libraries used for implementation.
Experiment Setup Yes The latent dimension is set to 128. ... We trained our model on two Nvidia A100-SXM4-40GB GPUs with a batch size of 100. For evaluation purposes, we sampled 50,000 images to compute the FID, setting total steps T = 100 for both the diffusion process and the decoder at every 500,000 training steps interval. The optimization was carried out using the Adam Optimizer, with a learning rate of 1 10 4 and no weight decay. The image dimensions inputted into the model were consistently set at 128 128 for FFHQ and 64 64 for Celeb A.