f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization
Authors: Sebastian Nowozin, Botond Cseke, Ryota Tomioka
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now train generative neural samplers based on VDM on the MNIST and LSUN datasets. We evaluate the performance using the kernel density estimation (Parzen window) approach used in [10]. To this end, we sample 16k images from the model and estimate a Parzen window estimator using an isotropic Gaussian kernel bandwidth using three fold cross validation. The final density model is used to evaluate the average log-likelihood on the MNIST test set (10k samples). We show the results in Table 4, and some samples from our models in Figure 2. |
| Researcher Affiliation | Industry | Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group Microsoft Research {Sebastian.Nowozin, Botond.Cseke, ryoto}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Single-Step Gradient Method |
| Open Source Code | No | The paper mentions 'The original implementation [10] of GANs... 3Available at https://github.com/goodfeli/adversarial' but does not explicitly state that the code for the f-GAN methodology described in this paper is publicly available, nor does it provide a link to their own specific implementation. |
| Open Datasets | Yes | MNIST Digits. We use the MNIST training data set (60,000 samples, 28-by-28 pixel images) to train the generator and variational function model proposed in [10] for various f-divergences. LSUN Natural Images. We use the large scale LSUN database [35] of natural images of different categories. |
| Dataset Splits | No | The paper mentions 'The final density model is used to evaluate the average log-likelihood on the MNIST test set (10k samples)' and 'three fold cross validation' for kernel bandwidth selection. However, it does not provide specific training/validation/test dataset splits (e.g., percentages or exact sample counts for each split) for the primary model training. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., CPU/GPU models, memory specifications) used to run the experiments. |
| Software Dependencies | No | The paper mentions optimizers (Adam [17]) and activation functions (exponential linear unit [4]) but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | With z Uniform100( 1, 1) as input, the generator model has two linear layers each followed by batch normalization and Re LU activation and a final linear layer followed by the sigmoid function. The variational function Vω(x) has three linear layers with exponential linear unit [4] in between. The final activation is specific to each divergence and listed in Table 2. As in [27] we use Adam with a learning rate of α = 0.0002 and update weight β = 0.5. We use a batchsize of 4096, sampled from the training set without replacement, and train each model for one hour. |