RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Authors: Zeyue Xue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu, Ping Luo
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments reveal that RAPHAEL outperforms recent cutting-edge models, such as Stable Diffusion, ERNIE-Vi LG 2.0, Deep Floyd, and DALL-E 2, in terms of both image quality and aesthetic appeal. |
| Researcher Affiliation | Collaboration | Zeyue Xue The University of Hong Kong xuezeyue@connect.hku.hk Guanglu Song Sense Time Research songguanglu@sensetime.com Qiushan Guo The University of Hong Kong qsguo@cs.hku.hk Boxiao Liu Sense Time Research liuboxiao@sensetime.com Zhuofan Zong Sense Time Research zongzhuofan@gmail.com Yu Liu Sense Time Research liuyuisanai@gmail.com Ping Luo The University of Hong Kong Shanghai AI Laboratory pluo@cs.hku.hk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | More details can be found on a webpage: https: //raphael-painter.github.io/ |
| Open Datasets | Yes | The training dataset consists of a subset of LAION-5B [20] and some internal datasets, including 730M text-images pairs in total. |
| Dataset Splits | No | The paper mentions selecting 30,000 images from the validation set for evaluation, but does not provide specific details on how the training, validation, and test splits were created for the overall dataset or if standard splits were followed beyond the selection from COCO's validation set. |
| Hardware Specification | Yes | a single model with three billion parameters, trained on 1, 000 A100 GPUs for two months |
| Software Dependencies | No | The entire model is implemented in Py Torch [24], and is trained by Adam W [25] optimizer. While PyTorch is mentioned, a specific version number is not provided, making the software dependency description incomplete for reproducibility. |
| Experiment Setup | Yes | The entire model is implemented in Py Torch [24], and is trained by Adam W [25] optimizer with a learning rate of 1e 4, a weight decay of 0, a batch size of 2, 000... Our experiments reveal that a suitable choice for Tc is 500, ensuring the effective learning of texture information. |