MC-DiT: Contextual Enhancement via Clean-to-Clean Reconstruction for Masked Diffusion Models

Authors: Guanghao Zheng, Yuchen Liu, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on 256 256 and 512 512 image generation on the Image Net dataset demonstrate that the proposed MC-Di T achieves state-of-the-art performance in unconditional and conditional image generation with enhanced convergence speed.
Researcher Affiliation Academia Guanghao Zheng, Yuchen Liu, Wenrui Dai , Chenglin Li, Junni Zou, Hongkai Xiong School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University
Pseudocode No The paper describes the proposed method using text and mathematical equations, but it does not include any pseudocode or algorithm blocks.
Open Source Code Yes Justification: We have provide the core file of our code in the supplementary. And the code will be released upon acceptance.
Open Datasets Yes We train MC-Di T on Image Net [39] with resolutions 256 256 3 and 512 512 3, respectively.
Dataset Splits No The paper mentions training and testing but does not explicitly detail a validation dataset split or how it was used.
Hardware Specification Yes Table 8: GPUs 2 RTX-3090 GPUs 4 V100 GPUs
Software Dependencies No The paper mentions software components like "Adam W optimizer", "pretrained variational autoencoder (VAE) from Stable Diffuion [37]", and "EDM [21] framework", but does not specify version numbers for these or other libraries/frameworks.
Experiment Setup Yes Most training settings are the same as Mask Di T [48]. We train MC-Di T for 400K to 1M iterations using the Adam W optimizer with learning rate 0.0001 and no weight decay. By default, we use 50% mask ratio and batch size 1024. λ1 and λ2 in (12) are set to 0.1 and 0.05 for more denoising reconstruction. The EMA coefficient is set to 0.999 for smoothness and no data augmentation is employed.