Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
Authors: Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 ExperimentsThe results of machine evaluation are demonstrated in Table 1. |
| Researcher Affiliation | Academia | Ming Ding Wendi Zheng Wenyi Hong Jie Tang Tsinghua University BAAI {dm18@mails, jietang@mail}.tsinghua.edu.cn |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | Codes and a demo website will be updated at https://github.com/THUDM/Cog View2. |
| Open Datasets | Yes | To compare with previous and concurrent works, we follow the most popular benchmark originated from DALL-E [26], Frรฉchet Inception Distances and Inception Scores evaluated on MS-COCO [17]. |
| Dataset Splits | Yes | 30,000 captions from the validation set are sampled to evaluate the FID. |
| Hardware Specification | Yes | The wall-clock time and FLOPs for a 4,096 sequence on an A100-40GB GPU with different AR-related methods. |
| Software Dependencies | No | The paper mentions 'Pytorch' but does not specify a version number for it or any other key software dependencies. |
| Experiment Setup | Yes | The model has 6 billion parameters (48 layers, hidden size 3072, 48 attention heads), trained for 300,000 iterations in FP16 with batch size 4,096. The sequence length is 512, consisting of 400 image tokens, 1 separator and up to 111 text tokens. |