Filtering Variational Objectives
Authors: Chris J. Maddison, John Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, Yee Teh
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we sought to: (a) compare models trained with ELBO, IWAE, and FIVO bounds in terms of final test log-likelihoods, (b) explore the effect of the resampling gradient terms on FIVO, (c) investigate how the lack of sharpness affects FIVO, and (d) consider how models trained with FIVO use the stochastic state. To explore these questions, we trained variational recurrent neural networks (VRNN) [39] with the ELBO, IWAE, and FIVO bounds using Tensor Flow [40] on two benchmark sequential modeling tasks: natural speech waveforms and polyphonic music. These datasets are known to be difficult to model without stochastic latent states [41]. |
| Researcher Affiliation | Collaboration | 1Deep Mind, 2Google Brain, 3University of Oxford |
| Pseudocode | Yes | Algorithm 1 Simulating LFIVO N (x1:T , p, q) |
| Open Source Code | No | The paper does not provide any explicit statement or link for the availability of its source code. |
| Open Datasets | Yes | We evaluated VRNNs trained with the ELBO, IWAE, and FIVO bounds on 4 polyphonic music datasets: the Nottingham folk tunes, the JSB chorales, the Muse Data library of classical piano and orchestral music, and the Piano-midi.de MIDI archive [42]. The TIMIT dataset is a standard benchmark for sequential models that contains 6300 utterances with an average duration of 3.1 seconds spoken by 630 different speakers. |
| Dataset Splits | Yes | Each dataset is split into standard train, valid, and test sets and is represented as a sequence of 88-dimensional binary vectors denoting the notes active at the current timestep. The 6300 utterances are divided into a training set of size 4620 and a test set of size 1680. We further divided the training set into a validation set of size 231 and a training set of size 4389, with the splits exactly as in [41]. |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments (e.g., GPU/CPU models, memory, etc.). |
| Software Dependencies | No | The paper mentions 'Tensor Flow [40]' but does not specify a version number for it or for any other software dependencies. |
| Experiment Setup | Yes | For FIVO we resampled when the ESS of the particles dropped below N/2. For FIVO and IWAE we used a batch size of 4, and for the ELBO, we used batch sizes of 4N to match computational budgets (resampling is O(N) with the alias method). We used 64 units for the RNN hidden state and latent state size for all polyphonic music models except for JSB chorales models, which used 32 units. The RNN is a single-layer LSTM and the conditionals are parameterized by fully connected neural networks with one hidden layer of the same size as the LSTM hidden layer. We used the residual parameterization [41] for the variational posterior. |