LMD: Faster Image Reconstruction with Latent Masking Diffusion
Authors: Zhiyuan Ma, Zhihuan Yu, Jianjun Li, Bowen Zhou
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two representative datasets Image Net-1K and Lsun-Bedrooms demonstrate the effectiveness of the proposed LMD model, showing that it achieves competitive performance against previous DPMs or MAEs models, but with significantly lower mean training time-consumption. The inference speed of LMD in image reconstruction also significantly outperforms the previous approaches. Moreover, LMD can be well generalized to a variety of downstream tasks, due to its flexible architecture. |
| Researcher Affiliation | Academia | Zhiyuan Ma1, Zhihuan Yu2, Jianjun Li2*, Bowen Zhou1* 1Department of Electronic Engineering, Tsinghua University 2School of Computer Science and Technology, Huazhong University of Science and Technology |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Following (He et al. 2022; Ho, Jain, and Abbeel 2020), we pre-train our model on Image Net1K (IN1K) (Deng et al. 2009) and LSUN-Bedrooms (Yu et al. 2015) respectively. |
| Dataset Splits | No | The paper mentions pre-training on ImageNet-1K and LSUN-Bedrooms but does not specify the train/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard splits). |
| Hardware Specification | No | The paper mentions 'V100 days' in the context of previous models' training costs, but does not explicitly state the specific hardware (e.g., GPU models, CPU models, memory) used for running *its own* experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software components like Python, PyTorch/TensorFlow, or CUDA. It mentions the Adan optimizer but not its version. |
| Experiment Setup | Yes | LMD adopts 20-layers Vi T as the backbone, of which 8 encoder blocks and 12 decoder blocks for generative training, and 12 encoder blocks and 8 decoder blocks for discriminant training. The mask ratio of the mask scheduler is set in [0.15, 0.75]. The scaling factor f is set as 8. The base learning rate is set as 1.5e 4, and the weight decay is set as 0.05. We use the Adan (Xie et al. 2022a) optimizer to optimize the model. |