Autoregressive Image Generation without Vector Quantization

Authors: Tianhong Li, Yonglong Tian, He Li, Mingyang Deng, Kaiming He

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment on Image Net [9] at a resolution of 256 256. We evaluate FID [22] and IS [43], and provide Precision and Recall as references following common practice [10].
Researcher Affiliation Collaboration 1MIT CSAIL 2Google DeepMind 3Tsinghua University
Pseudocode Yes Pseudo-code of Diffusion Loss. See Algorithm 1. Algorithm 1 Diffusion Loss: PyTorch-like Pseudo-code
Open Source Code Yes Code is available at https://github.com/LTH14/mar.
Open Datasets Yes We experiment on Image Net [9] at a resolution of 256 256.
Dataset Splits No The paper uses ImageNet but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) within the text of the paper.
Hardware Specification Yes Our training is mainly done on 16 servers with 8 V100 GPUs each.
Software Dependencies No The paper mentions using the Adam W optimizer and provides PyTorch-like pseudocode, but it does not specify version numbers for Python, PyTorch, or other relevant software libraries.
Experiment Setup Yes Our noise schedule has a cosine shape, with 1000 steps at training time; at inference time, it is resampled with fewer steps (by default, 100). By default, we use 3 blocks and a width of 1024 channels. By default, our Transformer has 32 blocks and a width of 1024... At training time, we randomly sample a masking ratio... in [0.7, 1.0]... By default, the models are trained using the Adam W optimizer for 400 epochs. The weight decay and momenta for Adam W are 0.02 and (0.9, 0.95). We use a batch size of 2048 and a learning rate (lr) of 8e-4.