reproducibilityindex.ai

Anytime Sampling for Autoregressive Models via Ordered Autoencoding

Authors: Yilun Xu, Yang Song, Sahaj Garg, Linyuan Gong, Rui Shu, Aditya Grover, Stefano Ermon

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we demonstrate in several image and audio generation tasks that sample quality degrades gracefully as we reduce the computational budget for sampling. The approach suffers almost no loss in sample quality (measured by FID) using only 60% to 80% of all latent dimensions for image data.
Researcher Affiliation	Academia	Yilun Xu Massachusetts Institute of Technology ylxu@mit.edu Yang Song Stanford University yangsong@cs.stanford.edu Sahaj Garg Stanford University sahajg@cs.stanford.edu Linyuan Gong UC Berkeley gonglinyuan@hotmail.com Rui Shu Stanford University ruishu@cs.stanford.edu Aditya Grover UC Berkeley aditya.grover1@gmail.com Stefano Ermon Stanford University ermon@cs.stanford.edu
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Newbeeer/Anytime-Auto-Regressive-Model. Our code is released via the anonymous link https://anonymous.4open.science/r/3946e9c8-8f98-4836abc1-0f711244476d/ and included in the supplementary material as well.
Open Datasets	Yes	We evaluate the model performance on the MNIST, CIFAR-10 (Krizhevsky, 2009) and Celeb A (Liu et al., 2014) datasets. We evaluate anytime autoregressive models on the VCTK dataset (Veaux et al., 2017).
Dataset Splits	No	The paper mentions selecting checkpoints with the smallest validation loss, implying a validation set was used, but it does not specify the method or percentages of the data split for training, validation, or testing.
Hardware Specification	Yes	All samples are produced on a single NVIDIA TITAN Xp GPU.
Software Dependencies	No	The paper mentions tools like TTUR for FID scores but does not provide specific version numbers for any software, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	For Celeb A, the images are resized to 64 64. All pixel values are scaled to the range [0, 1]. The full code length and the codebook size are 16 and 126 for MNIST, 70 and 1000 for CIFAR-10, and 100 and 500 for Celeb A respectively. For all the datasets, we use a 6-layer Transformer decoder with an embedding size of 512, latent size of 2048, and dropout rate of 0.1. We use 8 heads in multi-head self-attention layers. We pre-train the VQ-VAE models with full code lengths for 200 epochs. Then we train the VQ-VAE models with the new objective Eq. (6) for 200 more epochs. We use the Adam optimizer with learning rate 1.0e-3 for training. We train the autoregressive model for 50 epochs on both MNIST and CIFAR-10, and 100 epochs on Celeb A. We use the Adam optimizer with a learning rate of 2.0e-3 for the Transformer decoder. The batch size is fixed to be 128 during all training processes.