Parallel Multiscale Autoregressive Density Estimation

Authors: Scott Reed, Aäron Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Yutian Chen, Dan Belov, Nando Freitas

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the model on class-conditional image generation, text-to-image synthesis, and action-conditional video generation, showing that our model achieves the best results among non-pixel-autoregressive density models that allow efficient sampling.
Researcher Affiliation Industry 1Deep Mind. Correspondence to: Scott Reed <reedscot@google.com>.
Pseudocode No The paper describes the model architecture and process in text and diagrams (Figure 2, Figure 3) but does not include explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about open-source code release or a link to a code repository for the described methodology.
Open Datasets Yes We evaluate our model on Image Net (Deng et al., 2009), Caltech-UCSD Birds (CUB) (Wah et al., 2011), the MPII Human Pose dataset (MPII) (Andriluka et al., 2014), the Microsoft Common Objects in Context dataset (MS-COCO) (Lin et al., 2014), and the Google Robot Pushing dataset (Finn et al., 2016).
Dataset Splits Yes There are 50, 000 training sequences and a validation set with the same objects but di erent arm trajectories. One test set involves a subset of the objects seen during training and another involving novel objects, both captured on an arm and camera viewpoint not seen during training.
Hardware Specification Yes Table 4. Sampling speed of several models in seconds per frame on an Nvidia Quadro M4000 GPU.
Software Dependencies No The paper mentions using 'Tensor Flow' for image resizing ('tf.image.resize_images') but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes All models for Image Net, CUB, MPII and MS-COCO were trained using RMSprop with hyperparameter = 1e 8, with batch size 128 for 200K steps. The learning rate was set initially to 1e 4 and decayed to 1e 5.