Improving Diffusion-Based Image Synthesis with Context Prediction
Authors: Ling Yang, Jingwei Liu, Shenda Hong, Zhilong Zhang, Zhilin Huang, Zheming Cai, Wentao Zhang, Bin CUI
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on unconditional image generation, text-to-image generation and image inpainting tasks. Our CONPREDIFF consistently outperforms previous methods and achieves a new SOTA text-to-image generation results on MS-COCO, with a zero-shot FID score of 6.21. |
| Researcher Affiliation | Academia | 1Peking University 2 Tsinghua University |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | These considerations guide our decision not to release the source code or a public demo at this point in time. |
| Open Datasets | Yes | Regarding unconditional image generation, we choose four popular datasets for evaluation: Celeb A-HQ [34], FFHQ [35], LSUN-Church-outdoor [102], and LSUN-bedrooms [102]. For text-to-image generation, we train the model with LAION [73, 74] and some internal datasets, and conduct evaluations on MS-COCO dataset with zero-shot FID and CLIP score [25, 59] |
| Dataset Splits | No | The paper does not provide explicit details about train/validation/test dataset splits (percentages or counts) or refer to standard splits with specific citations for all datasets used. |
| Hardware Specification | Yes | We use the standard Adam optimizer with a learning rate of 0.0001, weight decay of 0.01, and a batch size of 1024 to optimize the base model and two super-resolution models on NVIDIA A100 GPUs, respectively, equipped with multi-scale training technique (6 image scales). |
| Software Dependencies | No | The paper mentions software components like T5, CLIP, Adam optimizer, U-Net, and Transformer, but does not specify version numbers for any programming languages, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | We use the standard Adam optimizer with a learning rate of 0.0001, weight decay of 0.01, and a batch size of 1024 to optimize the base model and two super-resolution models on NVIDIA A100 GPUs, respectively, equipped with multi-scale training technique (6 image scales). We use T = 250 time steps, and applied r = 10 times resampling with jumpy size j = 10. For unconditional generation tasks, we use the same denoising architecture like LDM [65] for fair comparison. The max channels are 224, and we use T=2000 time steps, linear noise schedule and an initial learning rate of 0.000096. |