Autoregressive Image Generation without Vector Quantization
Authors: Tianhong Li, Yonglong Tian, He Li, Mingyang Deng, Kaiming He
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment on Image Net [9] at a resolution of 256 256. We evaluate FID [22] and IS [43], and provide Precision and Recall as references following common practice [10]. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL 2Google DeepMind 3Tsinghua University |
| Pseudocode | Yes | Pseudo-code of Diffusion Loss. See Algorithm 1. Algorithm 1 Diffusion Loss: PyTorch-like Pseudo-code |
| Open Source Code | Yes | Code is available at https://github.com/LTH14/mar. |
| Open Datasets | Yes | We experiment on Image Net [9] at a resolution of 256 256. |
| Dataset Splits | No | The paper uses ImageNet but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) within the text of the paper. |
| Hardware Specification | Yes | Our training is mainly done on 16 servers with 8 V100 GPUs each. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer and provides PyTorch-like pseudocode, but it does not specify version numbers for Python, PyTorch, or other relevant software libraries. |
| Experiment Setup | Yes | Our noise schedule has a cosine shape, with 1000 steps at training time; at inference time, it is resampled with fewer steps (by default, 100). By default, we use 3 blocks and a width of 1024 channels. By default, our Transformer has 32 blocks and a width of 1024... At training time, we randomly sample a masking ratio... in [0.7, 1.0]... By default, the models are trained using the Adam W optimizer for 400 epochs. The weight decay and momenta for Adam W are 0.02 and (0.9, 0.95). We use a batch size of 2048 and a learning rate (lr) of 8e-4. |