reproducibilityindex.ai

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

Authors: Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 ExperimentsThe results of machine evaluation are demonstrated in Table 1.
Researcher Affiliation	Academia	Ming Ding Wendi Zheng Wenyi Hong Jie Tang Tsinghua University BAAI {dm18@mails, jietang@mail}.tsinghua.edu.cn
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	Codes and a demo website will be updated at https://github.com/THUDM/Cog View2.
Open Datasets	Yes	To compare with previous and concurrent works, we follow the most popular benchmark originated from DALL-E [26], Fréchet Inception Distances and Inception Scores evaluated on MS-COCO [17].
Dataset Splits	Yes	30,000 captions from the validation set are sampled to evaluate the FID.
Hardware Specification	Yes	The wall-clock time and FLOPs for a 4,096 sequence on an A100-40GB GPU with different AR-related methods.
Software Dependencies	No	The paper mentions 'Pytorch' but does not specify a version number for it or any other key software dependencies.
Experiment Setup	Yes	The model has 6 billion parameters (48 layers, hidden size 3072, 48 attention heads), trained for 300,000 iterations in FP16 with batch size 4,096. The sequence length is 512, consisting of 400 image tokens, 1 separator and up to 111 text tokens.