reproducibilityindex.ai

f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

Authors: Sebastian Nowozin, Botond Cseke, Ryota Tomioka

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now train generative neural samplers based on VDM on the MNIST and LSUN datasets. We evaluate the performance using the kernel density estimation (Parzen window) approach used in [10]. To this end, we sample 16k images from the model and estimate a Parzen window estimator using an isotropic Gaussian kernel bandwidth using three fold cross validation. The ﬁnal density model is used to evaluate the average log-likelihood on the MNIST test set (10k samples). We show the results in Table 4, and some samples from our models in Figure 2.
Researcher Affiliation	Industry	Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group Microsoft Research {Sebastian.Nowozin, Botond.Cseke, ryoto}@microsoft.com
Pseudocode	Yes	Algorithm 1 Single-Step Gradient Method
Open Source Code	No	The paper mentions 'The original implementation [10] of GANs... 3Available at https://github.com/goodfeli/adversarial' but does not explicitly state that the code for the f-GAN methodology described in this paper is publicly available, nor does it provide a link to their own specific implementation.
Open Datasets	Yes	MNIST Digits. We use the MNIST training data set (60,000 samples, 28-by-28 pixel images) to train the generator and variational function model proposed in [10] for various f-divergences. LSUN Natural Images. We use the large scale LSUN database [35] of natural images of different categories.
Dataset Splits	No	The paper mentions 'The ﬁnal density model is used to evaluate the average log-likelihood on the MNIST test set (10k samples)' and 'three fold cross validation' for kernel bandwidth selection. However, it does not provide specific training/validation/test dataset splits (e.g., percentages or exact sample counts for each split) for the primary model training.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., CPU/GPU models, memory specifications) used to run the experiments.
Software Dependencies	No	The paper mentions optimizers (Adam [17]) and activation functions (exponential linear unit [4]) but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the implementation.
Experiment Setup	Yes	With z Uniform100( 1, 1) as input, the generator model has two linear layers each followed by batch normalization and Re LU activation and a ﬁnal linear layer followed by the sigmoid function. The variational function Vω(x) has three linear layers with exponential linear unit [4] in between. The ﬁnal activation is speciﬁc to each divergence and listed in Table 2. As in [27] we use Adam with a learning rate of α = 0.0002 and update weight β = 0.5. We use a batchsize of 4096, sampled from the training set without replacement, and train each model for one hour.