reproducibilityindex.ai

RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths

Authors: Zeyue Xue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu, Ping Luo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments reveal that RAPHAEL outperforms recent cutting-edge models, such as Stable Diffusion, ERNIE-Vi LG 2.0, Deep Floyd, and DALL-E 2, in terms of both image quality and aesthetic appeal.
Researcher Affiliation	Collaboration	Zeyue Xue The University of Hong Kong xuezeyue@connect.hku.hk Guanglu Song Sense Time Research songguanglu@sensetime.com Qiushan Guo The University of Hong Kong qsguo@cs.hku.hk Boxiao Liu Sense Time Research liuboxiao@sensetime.com Zhuofan Zong Sense Time Research zongzhuofan@gmail.com Yu Liu Sense Time Research liuyuisanai@gmail.com Ping Luo The University of Hong Kong Shanghai AI Laboratory pluo@cs.hku.hk
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	More details can be found on a webpage: https: //raphael-painter.github.io/
Open Datasets	Yes	The training dataset consists of a subset of LAION-5B [20] and some internal datasets, including 730M text-images pairs in total.
Dataset Splits	No	The paper mentions selecting 30,000 images from the validation set for evaluation, but does not provide specific details on how the training, validation, and test splits were created for the overall dataset or if standard splits were followed beyond the selection from COCO's validation set.
Hardware Specification	Yes	a single model with three billion parameters, trained on 1, 000 A100 GPUs for two months
Software Dependencies	No	The entire model is implemented in Py Torch [24], and is trained by Adam W [25] optimizer. While PyTorch is mentioned, a specific version number is not provided, making the software dependency description incomplete for reproducibility.
Experiment Setup	Yes	The entire model is implemented in Py Torch [24], and is trained by Adam W [25] optimizer with a learning rate of 1e 4, a weight decay of 0, a batch size of 2, 000... Our experiments reveal that a suitable choice for Tc is 500, ensuring the effective learning of texture information.