Variational Memory Addressing in Generative Models

Authors: Jörg Bornschein, Andriy Mnih, Daniel Zoran, Danilo Jimenez Rezende

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To illustrate the advantages of this approach we incorporate it into a variational autoencoder and apply the resulting model to the task of generative few-shot learning. We demonstrate empirically that our model is able to identify and access the relevant memory contents even with hundreds of unseen Omniglot characters in memory.
Researcher Affiliation Industry Jörg Bornschein Andriy Mnih Daniel Zoran Danilo J. Rezende {bornschein, amnih, danielzoran, danilor}@google.com Deep Mind, London, UK
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. No repository link or explicit code release statement is present.
Open Datasets Yes We first perform a series of experiments on the binarized MNIST dataset [26]. To apply the model to a more challenging dataset and to use it for generative few-shot learning, we train it on various versions of the Omniglot [27] dataset. The dataset contains 24,345 unlabeled examples in the training, and 8,070 examples in the test set from 1623 different character classes.
Dataset Splits Yes The dataset contains 24,345 unlabeled examples in the training, and 8,070 examples in the test set from 1623 different character classes. For few-shot learning we therefore start from the original dataset [27] and scale the 104 104 pixel sized examples with 4 4 max-pooling to 26 26 pixels. We here use the 45/5 split introduced in [18] because we are mostly interested in the quantitative behaviour of the memory component, and not so much in finding optimal regularization hyperparameters to maximize performance on small datasets.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models or memory amounts. It only mentions general terms like 'on a GPU'.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers).
Experiment Setup Yes We optimize the parameters with Adam [25] and report experiments with the best results from learning rates in {1e-4, 3e-4}. We use minibatches of size 32 and K=4 samples from the approximate posterior q( |x) to compute the gradients, the KL estimates, and the log-likelihood bounds. It consists of 6 convolutional layers with 3 3 kernels and 48 or 64 feature maps each. Every second layer uses a stride of 2 to get an overall downsampling of 8 8. The convolutional pyramid is followed by a fully-connected MLP with 1 hidden layer and 2|z| output units. The embedding MLPs for p(a) and q(a|x) use the same convolutional architecture and map images x and memory content ma into a 128-dimensional matching space for the similarity calculations. By constraining the model size (|M|=256, convolutions with 32 feature maps) and adding 3e-4 L2 weight decay to all parameters with the exception of M, we obtain a model with a testset NLL of 103.6 nats.