Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

Authors: Anirudh Goyal ALIAS PARTH GOYAL, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a series of experimental results illustrating the soundness of the proposed approach, Variational Walkback (VW), on the MNIST, CIFAR-10, SVHN and Celeb A datasets, demonstrating superior samples compared to earlier attempts to learn a transition operator.
Researcher Affiliation Academia Anirudh Goyal MILA, Université de Montréal anirudhgoyal9119@gmail.com Nan Rosemary Ke MILA, École Polytechnique de Montréal rosemary.nan.ke@gmail.com Surya Ganguli Stanford University sganguli@stanford.edu Yoshua Bengio MILA, Université de Montréal yoshua.umontreal@gmail.com
Pseudocode Yes Algorithm 1 Variational Walkback(θ)
Open Source Code Yes Source Code: http://github.com/anirudh9119/walkback_nips17
Open Datasets Yes VW is evaluated on four datasets: MNIST, CIFAR10 (Krizhevsky and Hinton, 2009), SVHN (Netzer et al., 2011) and Celeb A (Liu et al., 2015).
Dataset Splits No The paper mentions 'monitoring L on a validation set' in Algorithm 1, but does not provide specific details on the dataset splits (percentages or counts) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'the Theano framework (Al-Rfou et al., 2016)' but does not provide a specific version number for it or other software dependencies.
Experiment Setup Yes The generative process starts by sampling a state s K from a broad Gaussian p (s K), whose variance is initially equal to the total data variance σ2 max (but can be later adapted to match the final samples from the inference trajectories). Then we sample from p Tmax(s K 1|s K), where Tmax is a high enough temperature so that the resultant injected noise can move the state across the whole domain of the data. ... Then we successively cool the temperature as we sample previous states st 1 according to p T (st 1|st), with T reduced by a factor of 2 at each step, followed by n steps at temperature 1. This cooling protocol requires the number of steps to be K = log2 Tmax + n, (1) in order to go from T = Tmax to T = 1 in K steps. We choose K from a random distribution. ... Algorithm 1: Require: N1 > 1 the number of initial temperature-1 steps of q trajectory (or ending a p trajectory). Set p to be a Gaussian with mean and variance of the data. Tmax σ2 max σ2 Sample n as a uniform integer between 0 and N1 K ceil(log2 Tmax) + n Sample x data (or equivalently sample a minibatch to parallelize computation and process each element of the minibatch independently)