Predictive Sampling with Forecasting Autoregressive Models
Authors: Auke Wiggers, Emiel Hoogeboom
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show considerable improvements over baselines in number of ARM inference calls and sampling speed. ... We demonstrate a considerable reduction in the number of forward passes, and consequently, sampling time, on binary MNIST, SVHN and CIFAR10. Additionally, we show on the SVHN, CIFAR10 and Imagenet32 datasets that predictive sampling can be used to speed up ancestral sampling from a discrete latent autoencoder, when an ARM is used to model the latent space. |
| Researcher Affiliation | Collaboration | 1 Qualcomm AI Research, Qualcomm Technologies Netherlands B.V.. Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc. 2 University of Amsterdam, Netherlands. 3Research done while completing an internship at Qualcomm AI Research.. |
| Pseudocode | Yes | Algorithm 1 Predictive Sampling; Algorithm 2 ARM Fixed-Point iteration |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | The used datasets are Binary MNIST (Larochelle & Murray, 2011), SVHN (Netzer et al., 2011), CIFAR10 (Krizhevsky et al., 2009), and Image Net32 (van den Oord et al., 2016b). |
| Dataset Splits | Yes | As validation data, we use the last 5000 images of the train split for MNIST and CIFAR10, we randomly select 8527 images from the train split for SVHN, and we randomly select 20000 images from the train split for Imagenet32. For all datasets, the remainder of the train split is used as training data. |
| Hardware Specification | Yes | Training took place on Nvidia Tesla V100 GPUs. To obtain sampling times, measurements were taken on a single Nvidia GTX 1080Ti GPU, with Nvidia driver 410.104, CUDA 10.0, and cu DNN v7.5.1. |
| Software Dependencies | Yes | All experiments were performed using Py Torch version 1.1.0 (Paszke et al., 2019). |
| Experiment Setup | Yes | For the forecasting modules, we choose a lightweight network architecture that forecasts T future timesteps. A triangular convolution is applied to h, the hidden representation of the ARM. This is followed by a 1 1 convolution with a number of output channels equal to the number of timesteps to forecast multiplied by the number of input categories. The number of forecasting modules T is 20 for binary MNIST and 1 or 5 for other datasets (the exact number is speciļ¬ed in brackets in the results). ... For a full list of hyperparameters and data preprocessing steps, see Appendix A. |