Variational Memory Addressing in Generative Models
Authors: Jörg Bornschein, Andriy Mnih, Daniel Zoran, Danilo Jimenez Rezende
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To illustrate the advantages of this approach we incorporate it into a variational autoencoder and apply the resulting model to the task of generative few-shot learning. We demonstrate empirically that our model is able to identify and access the relevant memory contents even with hundreds of unseen Omniglot characters in memory. |
| Researcher Affiliation | Industry | Jörg Bornschein Andriy Mnih Daniel Zoran Danilo J. Rezende {bornschein, amnih, danielzoran, danilor}@google.com Deep Mind, London, UK |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. No repository link or explicit code release statement is present. |
| Open Datasets | Yes | We first perform a series of experiments on the binarized MNIST dataset [26]. To apply the model to a more challenging dataset and to use it for generative few-shot learning, we train it on various versions of the Omniglot [27] dataset. The dataset contains 24,345 unlabeled examples in the training, and 8,070 examples in the test set from 1623 different character classes. |
| Dataset Splits | Yes | The dataset contains 24,345 unlabeled examples in the training, and 8,070 examples in the test set from 1623 different character classes. For few-shot learning we therefore start from the original dataset [27] and scale the 104 104 pixel sized examples with 4 4 max-pooling to 26 26 pixels. We here use the 45/5 split introduced in [18] because we are mostly interested in the quantitative behaviour of the memory component, and not so much in finding optimal regularization hyperparameters to maximize performance on small datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models or memory amounts. It only mentions general terms like 'on a GPU'. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers). |
| Experiment Setup | Yes | We optimize the parameters with Adam [25] and report experiments with the best results from learning rates in {1e-4, 3e-4}. We use minibatches of size 32 and K=4 samples from the approximate posterior q( |x) to compute the gradients, the KL estimates, and the log-likelihood bounds. It consists of 6 convolutional layers with 3 3 kernels and 48 or 64 feature maps each. Every second layer uses a stride of 2 to get an overall downsampling of 8 8. The convolutional pyramid is followed by a fully-connected MLP with 1 hidden layer and 2|z| output units. The embedding MLPs for p(a) and q(a|x) use the same convolutional architecture and map images x and memory content ma into a 128-dimensional matching space for the similarity calculations. By constraining the model size (|M|=256, convolutions with 32 feature maps) and adding 3e-4 L2 weight decay to all parameters with the exception of M, we obtain a model with a testset NLL of 103.6 nats. |