reproducibilityindex.ai

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Authors: Keyu Tian, Yi Jiang, Zehuan Yuan, BINGYUE PENG, Liwei Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On Image Net 256 256 benchmark, VAR significantly improve AR baseline by improving Fréchet inception distance (FID) from 18.65 to 1.73, inception score (IS) from 80.4 to 350.2, with 20 faster inference speed.
Researcher Affiliation	Collaboration	Keyu Tian1,2, Yi Jiang2, , Zehuan Yuan2, , Bingyue Peng2, Liwei Wang1,3, 1Center for Data Science, Peking University 2Bytedance Inc. 3State Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University
Pseudocode	Yes	Algorithm 1: Multi-scale VQVAE Encoding
Open Source Code	Yes	Codes and models: https://github.com/Foundation Vision/VAR
Open Datasets	Yes	We trained models across 12 different sizes, from 18M to 2B parameters, on the Image Net training set [24] containing 1.28M images
Dataset Splits	Yes	We assessed the final test cross-entropy loss L and token prediction error rates Err on the Image Net validation set of 50,000 images [24].
Hardware Specification	No	The paper mentions training compute in PFlops, but does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using a 'GPT-2-like transformer architecture' and 'Adam W optimizer' but does not specify version numbers for any software libraries or dependencies (e.g., PyTorch version, CUDA version).
Experiment Setup	Yes	All models are trained with the similar settings: a base learning rate of 10 4 per 256 batch size, an Adam W optimizer with β1 = 0.9, β2 = 0.95, decay = 0.05, a batch size from 768 to 1024 and training epochs from 200 to 350 (depends on model size).