reproducibilityindex.ai

MC-DiT: Contextual Enhancement via Clean-to-Clean Reconstruction for Masked Diffusion Models

Authors: Guanghao Zheng, Yuchen Liu, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on 256 256 and 512 512 image generation on the Image Net dataset demonstrate that the proposed MC-Di T achieves state-of-the-art performance in unconditional and conditional image generation with enhanced convergence speed.
Researcher Affiliation	Academia	Guanghao Zheng, Yuchen Liu, Wenrui Dai , Chenglin Li, Junni Zou, Hongkai Xiong School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University
Pseudocode	No	The paper describes the proposed method using text and mathematical equations, but it does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Justification: We have provide the core file of our code in the supplementary. And the code will be released upon acceptance.
Open Datasets	Yes	We train MC-Di T on Image Net [39] with resolutions 256 256 3 and 512 512 3, respectively.
Dataset Splits	No	The paper mentions training and testing but does not explicitly detail a validation dataset split or how it was used.
Hardware Specification	Yes	Table 8: GPUs 2 RTX-3090 GPUs 4 V100 GPUs
Software Dependencies	No	The paper mentions software components like "Adam W optimizer", "pretrained variational autoencoder (VAE) from Stable Diffuion [37]", and "EDM [21] framework", but does not specify version numbers for these or other libraries/frameworks.
Experiment Setup	Yes	Most training settings are the same as Mask Di T [48]. We train MC-Di T for 400K to 1M iterations using the Adam W optimizer with learning rate 0.0001 and no weight decay. By default, we use 50% mask ratio and batch size 1024. λ1 and λ2 in (12) are set to 0.1 and 0.05 for more denoising reconstruction. The EMA coefficient is set to 0.999 for smoothness and no data augmentation is employed.