f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

Authors: Sebastian Nowozin, Botond Cseke, Ryota Tomioka

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now train generative neural samplers based on VDM on the MNIST and LSUN datasets. We evaluate the performance using the kernel density estimation (Parzen window) approach used in [10]. To this end, we sample 16k images from the model and estimate a Parzen window estimator using an isotropic Gaussian kernel bandwidth using three fold cross validation. The final density model is used to evaluate the average log-likelihood on the MNIST test set (10k samples). We show the results in Table 4, and some samples from our models in Figure 2.
Researcher Affiliation Industry Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group Microsoft Research {Sebastian.Nowozin, Botond.Cseke, ryoto}@microsoft.com
Pseudocode Yes Algorithm 1 Single-Step Gradient Method
Open Source Code No The paper mentions 'The original implementation [10] of GANs... 3Available at https://github.com/goodfeli/adversarial' but does not explicitly state that the code for the f-GAN methodology described in this paper is publicly available, nor does it provide a link to their own specific implementation.
Open Datasets Yes MNIST Digits. We use the MNIST training data set (60,000 samples, 28-by-28 pixel images) to train the generator and variational function model proposed in [10] for various f-divergences. LSUN Natural Images. We use the large scale LSUN database [35] of natural images of different categories.
Dataset Splits No The paper mentions 'The final density model is used to evaluate the average log-likelihood on the MNIST test set (10k samples)' and 'three fold cross validation' for kernel bandwidth selection. However, it does not provide specific training/validation/test dataset splits (e.g., percentages or exact sample counts for each split) for the primary model training.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., CPU/GPU models, memory specifications) used to run the experiments.
Software Dependencies No The paper mentions optimizers (Adam [17]) and activation functions (exponential linear unit [4]) but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the implementation.
Experiment Setup Yes With z Uniform100( 1, 1) as input, the generator model has two linear layers each followed by batch normalization and Re LU activation and a final linear layer followed by the sigmoid function. The variational function Vω(x) has three linear layers with exponential linear unit [4] in between. The final activation is specific to each divergence and listed in Table 2. As in [27] we use Adam with a learning rate of α = 0.0002 and update weight β = 0.5. We use a batchsize of 4096, sampled from the training set without replacement, and train each model for one hour.