LMD: Faster Image Reconstruction with Latent Masking Diffusion

Authors: Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Bowen Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two representative datasets Image Net-1K and Lsun-Bedrooms demonstrate the effectiveness of the proposed LMD model, showing that it achieves competitive performance against previous DPMs or MAEs models, but with significantly lower mean training time-consumption. The inference speed of LMD in image reconstruction also significantly outperforms the previous approaches. Moreover, LMD can be well generalized to a variety of downstream tasks, due to its flexible architecture.
Researcher Affiliation Academia Zhiyuan Ma1, Zhihuan Yu2, Jianjun Li2*, Bowen Zhou1* 1Department of Electronic Engineering, Tsinghua University 2School of Computer Science and Technology, Huazhong University of Science and Technology
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes Following (He et al. 2022; Ho, Jain, and Abbeel 2020), we pre-train our model on Image Net1K (IN1K) (Deng et al. 2009) and LSUN-Bedrooms (Yu et al. 2015) respectively.
Dataset Splits No The paper mentions pre-training on ImageNet-1K and LSUN-Bedrooms but does not specify the train/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard splits).
Hardware Specification No The paper mentions 'V100 days' in the context of previous models' training costs, but does not explicitly state the specific hardware (e.g., GPU models, CPU models, memory) used for running *its own* experiments.
Software Dependencies No The paper does not provide specific version numbers for software components like Python, PyTorch/TensorFlow, or CUDA. It mentions the Adan optimizer but not its version.
Experiment Setup Yes LMD adopts 20-layers Vi T as the backbone, of which 8 encoder blocks and 12 decoder blocks for generative training, and 12 encoder blocks and 8 decoder blocks for discriminant training. The mask ratio of the mask scheduler is set in [0.15, 0.75]. The scaling factor f is set as 8. The base learning rate is set as 1.5e 4, and the weight decay is set as 0.05. We use the Adan (Xie et al. 2022a) optimizer to optimize the model.