reproducibilityindex.ai

Tighter Variational Bounds are Not Necessarily Better

Authors: Tom Rainforth, Adam Kosiorek, Tuan Anh Le, Chris Maddison, Maximilian Igl, Frank Wood, Yee Whye Teh

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted autoencoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Furthermore, our results suggest that PIWAE may be able to deliver simultaneous improvements in the training of both the inference and generative networks.
Researcher Affiliation	Academia	1Department of Statistics, University of Oxford 2Department of Engineering, University of Oxford 3Department of Computer Science, University of British Columbia.
Pseudocode	No	The paper describes algorithms using mathematical equations and textual descriptions, but no explicit pseudocode blocks or sections labeled 'Algorithm' were found.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	Yes	We now use our new estimators to train deep generative models for the MNIST digits dataset (Le Cun et al., 1998).
Dataset Splits	No	The paper mentions using the MNIST dataset and evaluating on the test set but does not provide specific details on the training, validation, and test splits (e.g., percentages, sample counts, or explicit split methods).
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the implementation or experiments.
Experiment Setup	Yes	For this, we duplicated the architecture and training schedule outlined in Burda et al. (2016). In particular, all networks were trained and evaluated using their stochastic binarization. For all methods we set a budget of T = 64 weights in the target estimate for each datapoint in the minibatch. Here we have considered the middle value for each of the parameters, namely K = M = 8 for PIWAE and MIWAE, and β = 0.5 for CIWAE.