reproducibilityindex.ai

CogView: Mastering Text-to-Image Generation via Transformers

Authors: Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Cog View achieves the state-of-the-art FID on the blurred MS COCO dataset, outperforming previous GAN-based models and a recent similar work DALL-E.
Researcher Affiliation	Collaboration	Tsinghua University DAMO Academy, Alibaba Group BAAI {dm18@mails, jietang@mail}.tsinghua.edu.cn
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1Codes and models are at https://github.com/THUDM/Cog View.
Open Datasets	Yes	At present, the most authoritative machine evaluation metrics for general-domain text-to-image generation is the FID on MS COCO, which is not included in our training set. [31] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740 755. Springer, 2014.
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits with specific percentages or sample counts. While it mentions evaluating on a 'subset' for testing, it doesn't detail how the overall dataset was partitioned for training and validation phases.
Hardware Specification	Yes	We train the model with batch size of 6,144 sequences (6.7 million tokens per batch) for 144,000 steps on 512 V100 GPUs (32GB).
Software Dependencies	No	The paper mentions software components like "Adam" (optimizer) and "Sentence Piece" (tokenizer) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We train the model with batch size of 6,144 sequences (6.7 million tokens per batch) for 144,000 steps on 512 V100 GPUs (32GB). The parameters are updated by Adam with max lr = 3 10 4, β1 = 0.9, β2 = 0.95, weight decay = 4 10 2. The learning rate warms up during the ﬁrst 2% steps and decays with cosine annealing [34].