One-Shot Generalization in Deep Generative Models

Authors: Danilo Rezende, Shakir, Ivo Danihelka, Karol Gregor, Daan Wierstra

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the one-shot generalization ability of our models using three tasks: unconditional sampling, generating new exemplars of a given concept, and generating new exemplars of a family of concepts. In all cases our models are able to generate compelling and diverse samples having seen new examples just once providing an important class of general-purpose models for one-shot machine learning. Table 1. Test set negative log-likelihood on MNIST.
Researcher Affiliation Industry Danilo J. Rezende* DANILOR@GOOGLE.COM Shakir Mohamed* SHAKIR@GOOGLE.COM Ivo Danihelka DANIHELKA@GOOGLE.COM Karol Gregor KAROLG@GOOGLE.COM Daan Wierstra WIERSTRA@GOOGLE.COM Google Deep Mind, London
Pseudocode No The paper describes the model and inference using equations and textual descriptions, but does not include any explicit 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not include an unambiguous statement about releasing source code or a direct link to a code repository for the described methodology.
Open Datasets Yes The first experiment uses the binarized MNIST data set of Salakhutdinov & Murray (2008), that consists of 28 28 binary images with 50,000 training and 10,000 test images. The omniglot data set (Lake et al., 2015) consists of 105 105 binary images across 1628 classes with just 20 images per class. The Multi-PIE dataset (Gross et al., 2010) consists of 48 48 RGB face images from various viewpoints. Our simplification results in 93, 130 training samples and 10, 000 test samples.
Dataset Splits No The paper specifies training and test set sizes and various train-test splits for different datasets (e.g., '50,000 training and 10,000 test images' for MNIST, '93, 130 training samples and 10, 000 test samples' for Multi-PIE), but does not explicitly mention a separate validation split or dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments, only general training parameters like 'approximatively 800K iterations'.
Software Dependencies No The paper describes the use of models and architectures (e.g., LSTM, CGRU), but does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks with their versions).
Experiment Setup Yes In all models we use 400 LSTM hidden units. We use 12x12 kernels for the spatial transformer, whether used for recognition or generative attention. The latent variable zt are 4-dimensional Gaussian distributions and we use a number of steps that vary from 20-80. The hidden canvas has dimensions that are the size of the images with four channels. All the models were trained for approximatively 800K iterations with mini-batches of size 24.