Anytime Sampling for Autoregressive Models via Ordered Autoencoding
Authors: Yilun Xu, Yang Song, Sahaj Garg, Linyuan Gong, Rui Shu, Aditya Grover, Stefano Ermon
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we demonstrate in several image and audio generation tasks that sample quality degrades gracefully as we reduce the computational budget for sampling. The approach suffers almost no loss in sample quality (measured by FID) using only 60% to 80% of all latent dimensions for image data. |
| Researcher Affiliation | Academia | Yilun Xu Massachusetts Institute of Technology ylxu@mit.edu Yang Song Stanford University yangsong@cs.stanford.edu Sahaj Garg Stanford University sahajg@cs.stanford.edu Linyuan Gong UC Berkeley gonglinyuan@hotmail.com Rui Shu Stanford University ruishu@cs.stanford.edu Aditya Grover UC Berkeley aditya.grover1@gmail.com Stefano Ermon Stanford University ermon@cs.stanford.edu |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Newbeeer/Anytime-Auto-Regressive-Model. Our code is released via the anonymous link https://anonymous.4open.science/r/3946e9c8-8f98-4836abc1-0f711244476d/ and included in the supplementary material as well. |
| Open Datasets | Yes | We evaluate the model performance on the MNIST, CIFAR-10 (Krizhevsky, 2009) and Celeb A (Liu et al., 2014) datasets. We evaluate anytime autoregressive models on the VCTK dataset (Veaux et al., 2017). |
| Dataset Splits | No | The paper mentions selecting checkpoints with the smallest validation loss, implying a validation set was used, but it does not specify the method or percentages of the data split for training, validation, or testing. |
| Hardware Specification | Yes | All samples are produced on a single NVIDIA TITAN Xp GPU. |
| Software Dependencies | No | The paper mentions tools like TTUR for FID scores but does not provide specific version numbers for any software, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | For Celeb A, the images are resized to 64 64. All pixel values are scaled to the range [0, 1]. The full code length and the codebook size are 16 and 126 for MNIST, 70 and 1000 for CIFAR-10, and 100 and 500 for Celeb A respectively. For all the datasets, we use a 6-layer Transformer decoder with an embedding size of 512, latent size of 2048, and dropout rate of 0.1. We use 8 heads in multi-head self-attention layers. We pre-train the VQ-VAE models with full code lengths for 200 epochs. Then we train the VQ-VAE models with the new objective Eq. (6) for 200 more epochs. We use the Adam optimizer with learning rate 1.0e-3 for training. We train the autoregressive model for 50 epochs on both MNIST and CIFAR-10, and 100 epochs on Celeb A. We use the Adam optimizer with a learning rate of 2.0e-3 for the Transformer decoder. The batch size is fixed to be 128 during all training processes. |