reproducibilityindex.ai

Filtering Variational Objectives

Authors: Chris J. Maddison, John Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, Yee Teh

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we sought to: (a) compare models trained with ELBO, IWAE, and FIVO bounds in terms of ﬁnal test log-likelihoods, (b) explore the effect of the resampling gradient terms on FIVO, (c) investigate how the lack of sharpness affects FIVO, and (d) consider how models trained with FIVO use the stochastic state. To explore these questions, we trained variational recurrent neural networks (VRNN) [39] with the ELBO, IWAE, and FIVO bounds using Tensor Flow [40] on two benchmark sequential modeling tasks: natural speech waveforms and polyphonic music. These datasets are known to be difﬁcult to model without stochastic latent states [41].
Researcher Affiliation	Collaboration	1Deep Mind, 2Google Brain, 3University of Oxford
Pseudocode	Yes	Algorithm 1 Simulating LFIVO N (x1:T , p, q)
Open Source Code	No	The paper does not provide any explicit statement or link for the availability of its source code.
Open Datasets	Yes	We evaluated VRNNs trained with the ELBO, IWAE, and FIVO bounds on 4 polyphonic music datasets: the Nottingham folk tunes, the JSB chorales, the Muse Data library of classical piano and orchestral music, and the Piano-midi.de MIDI archive [42]. The TIMIT dataset is a standard benchmark for sequential models that contains 6300 utterances with an average duration of 3.1 seconds spoken by 630 different speakers.
Dataset Splits	Yes	Each dataset is split into standard train, valid, and test sets and is represented as a sequence of 88-dimensional binary vectors denoting the notes active at the current timestep. The 6300 utterances are divided into a training set of size 4620 and a test set of size 1680. We further divided the training set into a validation set of size 231 and a training set of size 4389, with the splits exactly as in [41].
Hardware Specification	No	The paper does not specify the hardware used for the experiments (e.g., GPU/CPU models, memory, etc.).
Software Dependencies	No	The paper mentions 'Tensor Flow [40]' but does not specify a version number for it or for any other software dependencies.
Experiment Setup	Yes	For FIVO we resampled when the ESS of the particles dropped below N/2. For FIVO and IWAE we used a batch size of 4, and for the ELBO, we used batch sizes of 4N to match computational budgets (resampling is O(N) with the alias method). We used 64 units for the RNN hidden state and latent state size for all polyphonic music models except for JSB chorales models, which used 32 units. The RNN is a single-layer LSTM and the conditionals are parameterized by fully connected neural networks with one hidden layer of the same size as the LSTM hidden layer. We used the residual parameterization [41] for the variational posterior.