reproducibilityindex.ai

Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

Authors: Tao Yang, Cuiling Lan, Yan Lu, Nanning Zheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have conducted comprehensive ablation studies and visualization analyses, shedding a light on the functioning of this model.
Researcher Affiliation	Collaboration	Tao Yang1 , Cuiling Lan2 , Yan Lu2, Nanning Zheng1 yt14212@stu.xjtu.edu.cn, {culan, yanlu}@microsoft.com, nnzheng@mail.xjtu.edu.cn 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University, China 2Microsoft Research Asia
Pseudocode	No	No pseudocode or algorithm block is explicitly provided in the paper.
Open Source Code	Yes	https://github.com/thomasmry/Enc Diff
Open Datasets	Yes	To evaluate the disentanglement performance, we utilize the commonly used benchmark datasets: Shapes3D [19], MPI3D [10] and Cars3D [27]. Shapes3D [19] consists of a collection of 3D shapes. MPI3D is a dataset of 3D objects created in a controlled setting. Cars3D is a dataset consisting of 3D-rendered cars. For real-world data, we conduct our experiments using Celeb A, a dataset of celebrity faces with attributes.
Dataset Splits	No	The paper mentions using 'a consistent batch size of 64' and 'a learning rate of 1 10 4', and refers to 'standard practice of employing an Exponential Moving Average (EMA)'. While training details are provided, explicit percentages or sample counts for training, validation, or test splits are not specified.
Hardware Specification	Yes	We train Enc Diff on a single Tesla V100 16G GPU.
Software Dependencies	No	The paper refers to using 'latent diffusion models (LDMs)' and 'VQ-reg' but does not provide specific version numbers for software libraries or dependencies like PyTorch, CUDA, etc.
Experiment Setup	Yes	During the training phase of Enc Diff, we maintain a consistent batch size of 64 across all datasets. The learning rate is consistently set to 1 10 4. We adopt the standard practice of employing an Exponential Moving Average (EMA) with a decay factor of 0.9999 for all model parameters. The training hyper-parameters follows Dis Diff [40] and Dis Co [28]. For each concept token, we follow Dis Diff [40] to use a 32 dimensional representation vector.