Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Authors: Jian Liang, Chenfei Wu, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments, 4.1 Experiment Setup, 4.2 Evaluation on Visual Synthesis, 4.3 Ablation Studies |
| Researcher Affiliation | Collaboration | 1Peking University 2Microsoft Research Asia 3Microsoft Azure AI |
| Pseudocode | Yes | Algorithm 1: Training Strategy, Algorithm 2: Inference Strategy |
| Open Source Code | No | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | For image synthesis, we trained unconditional generation model on the LHQ [30]... For video synthesis, we downloaded 120k high-resolution videos from pexels website... |
| Dataset Splits | No | For image synthesis, we trained unconditional generation model on the LHQ [30], which consists of 90k high-resolution ( 1024^2) nature landsacapes. In addition to support text prompt, we added a caption for each image of LHQ to create a new dataset called LHQC, where 85k as training data and 5k as test data. (No mention of validation split) |
| Hardware Specification | No | 3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Supplementary material (The main paper refers to supplementary material, but does not state it directly in the provided text). |
| Software Dependencies | No | The paper mentions using a 'VQGAN model' and 'Adam optimizer' but does not specify software dependencies with version numbers (e.g., Python version, PyTorch version, specific library versions). |
| Experiment Setup | Yes | Implementation Details. During training, images are cropped into 1024 1024 and videos are cut into 1024 1024 5 with 5fps, then, they will be encoded into discrete tokens using the VQGAN model with a compression rate of 16 and a codebook of 16384. In Sec. 3.2, the rendering size of the three models is 256 256. In Sec. 3.1, based on the nearby sparsity, we set (eh, ew, ef) = (2, 2, 0) for images and (eh, ew, ef) = (1, 1, 3) for videos. We train the model using an Adam optimizer [16] with learning rate of 1e-4, a batch size of 256, and warm-up 5% of total 50 epochs. |