Iterative Amortized Inference

Authors: Joe Marino, Yisong Yue, Stephan Mandt

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We performed an empirical evaluation of iterative inference models on both image and text data. For images, we used MNIST (Le Cun et al., 1998), Omniglot (Lake et al., 2013), Street View House Numbers (SVHN) (Netzer et al., 2011), and CIFAR-10 (Krizhevsky & Hinton, 2009). For text, we used RCV1 (Lewis et al., 2004)... Table 1 contains estimated marginal log-likelihood performance on MNIST and CIFAR-10. Table 2 contains estimated perplexity on RCV1.
Researcher Affiliation Collaboration 1California Institute of Technology (Caltech), Pasadena, CA, USA 2Disney Research, Los Angeles, CA, USA.
Pseudocode Yes Figure 2 displays a computation graph of the inference procedure, and Algorithm 1 in Appendix B describes the procedure in detail.
Open Source Code Yes Accompanying code can be found on Git Hub at joelouismarino/iterative inference.
Open Datasets Yes For images, we used MNIST (Le Cun et al., 1998), Omniglot (Lake et al., 2013), Street View House Numbers (SVHN) (Netzer et al., 2011), and CIFAR-10 (Krizhevsky & Hinton, 2009). For text, we used RCV1 (Lewis et al., 2004).
Dataset Splits Yes In Figure 4, we plot the average ELBO on the MNIST validation set during inference, comparing iterative inference models with conventional optimizers. Details are in Appendix C.2.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper mentions techniques like 'layer normalization' (Ba et al., 2016) and implies use of common ML frameworks given the context, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes Additional experiment details, including model architectures, can be found in Appendix C. We trained models by encoding approximate posterior gradients ( λL) or errors (εx, εz), with or without the data (x), for 2, 5, 10, and 16 inference iterations. ...we trained standard and iterative inference models on MNIST using 1, 5, 10, and 20 approximate posterior samples.