Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement
Authors: Tao Yang, Cuiling Lan, Yan Lu, Nanning Zheng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have conducted comprehensive ablation studies and visualization analyses, shedding a light on the functioning of this model. |
| Researcher Affiliation | Collaboration | Tao Yang1 , Cuiling Lan2 , Yan Lu2, Nanning Zheng1 yt14212@stu.xjtu.edu.cn, {culan, yanlu}@microsoft.com, nnzheng@mail.xjtu.edu.cn 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University, China 2Microsoft Research Asia |
| Pseudocode | No | No pseudocode or algorithm block is explicitly provided in the paper. |
| Open Source Code | Yes | https://github.com/thomasmry/Enc Diff |
| Open Datasets | Yes | To evaluate the disentanglement performance, we utilize the commonly used benchmark datasets: Shapes3D [19], MPI3D [10] and Cars3D [27]. Shapes3D [19] consists of a collection of 3D shapes. MPI3D is a dataset of 3D objects created in a controlled setting. Cars3D is a dataset consisting of 3D-rendered cars. For real-world data, we conduct our experiments using Celeb A, a dataset of celebrity faces with attributes. |
| Dataset Splits | No | The paper mentions using 'a consistent batch size of 64' and 'a learning rate of 1 10 4', and refers to 'standard practice of employing an Exponential Moving Average (EMA)'. While training details are provided, explicit percentages or sample counts for training, validation, or test splits are not specified. |
| Hardware Specification | Yes | We train Enc Diff on a single Tesla V100 16G GPU. |
| Software Dependencies | No | The paper refers to using 'latent diffusion models (LDMs)' and 'VQ-reg' but does not provide specific version numbers for software libraries or dependencies like PyTorch, CUDA, etc. |
| Experiment Setup | Yes | During the training phase of Enc Diff, we maintain a consistent batch size of 64 across all datasets. The learning rate is consistently set to 1 10 4. We adopt the standard practice of employing an Exponential Moving Average (EMA) with a decay factor of 0.9999 for all model parameters. The training hyper-parameters follows Dis Diff [40] and Dis Co [28]. For each concept token, we follow Dis Diff [40] to use a 32 dimensional representation vector. |