reproducibilityindex.ai

Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

Authors: Anirudh Goyal ALIAS PARTH GOYAL, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a series of experimental results illustrating the soundness of the proposed approach, Variational Walkback (VW), on the MNIST, CIFAR-10, SVHN and Celeb A datasets, demonstrating superior samples compared to earlier attempts to learn a transition operator.
Researcher Affiliation	Academia	Anirudh Goyal MILA, Université de Montréal anirudhgoyal9119@gmail.com Nan Rosemary Ke MILA, École Polytechnique de Montréal rosemary.nan.ke@gmail.com Surya Ganguli Stanford University sganguli@stanford.edu Yoshua Bengio MILA, Université de Montréal yoshua.umontreal@gmail.com
Pseudocode	Yes	Algorithm 1 Variational Walkback(θ)
Open Source Code	Yes	Source Code: http://github.com/anirudh9119/walkback_nips17
Open Datasets	Yes	VW is evaluated on four datasets: MNIST, CIFAR10 (Krizhevsky and Hinton, 2009), SVHN (Netzer et al., 2011) and Celeb A (Liu et al., 2015).
Dataset Splits	No	The paper mentions 'monitoring L on a validation set' in Algorithm 1, but does not provide specific details on the dataset splits (percentages or counts) for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions 'the Theano framework (Al-Rfou et al., 2016)' but does not provide a specific version number for it or other software dependencies.
Experiment Setup	Yes	The generative process starts by sampling a state s K from a broad Gaussian p (s K), whose variance is initially equal to the total data variance σ2 max (but can be later adapted to match the ﬁnal samples from the inference trajectories). Then we sample from p Tmax(s K 1\|s K), where Tmax is a high enough temperature so that the resultant injected noise can move the state across the whole domain of the data. ... Then we successively cool the temperature as we sample previous states st 1 according to p T (st 1\|st), with T reduced by a factor of 2 at each step, followed by n steps at temperature 1. This cooling protocol requires the number of steps to be K = log2 Tmax + n, (1) in order to go from T = Tmax to T = 1 in K steps. We choose K from a random distribution. ... Algorithm 1: Require: N1 > 1 the number of initial temperature-1 steps of q trajectory (or ending a p trajectory). Set p to be a Gaussian with mean and variance of the data. Tmax σ2 max σ2 Sample n as a uniform integer between 0 and N1 K ceil(log2 Tmax) + n Sample x data (or equivalently sample a minibatch to parallelize computation and process each element of the minibatch independently)