reproducibilityindex.ai

SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models

Authors: Yucen Luo, Alex Beatson, Mohammad Norouzi, Jun Zhu, David Duvenaud, Ryan P. Adams, Ricky T. Q. Chen

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. (Abstract); We ﬁrst compare the performance of SUMO when used as a replacement to IWAE with the same expected cost on density modeling tasks. (Section 5)
Researcher Affiliation	Collaboration	Yucen Luo Tsinghua University luoyc15@mails.tsinghua.edu.cn; Alex Beatson Princeton University abeatson@cs.princeton.edu; Mohammad Norouzi Google Research mnorouzi@google.com; Jun Zhu Tsinghua University dcszj@tsinghua.edu.cn; David Duvenaud University of Toronto duvenaud@cs.toronto.edu; Ryan P. Adams Princeton University rpa@princeton.edu; Ricky T. Q. Chen University of Toronto rtqichen@cs.toronto.edu
Pseudocode	Yes	Algorithm 1 Computing SUMO, an unbiased estimator of log p(x).
Open Source Code	No	The paper does not provide any explicit statement about open-sourcing the code, nor does it include a link to a code repository.
Open Datasets	Yes	We make use of two benchmark datasets: dynamically binarized MNIST (Le Cun et al., 1998) and binarized OMNIGLOT (Lake et al., 2015).
Dataset Splits	Yes	The learning rate is reduced by factor 0.8 if the validation likelihood does not improve for 50 epochs. (Appendix A.8); We report the performance of models with early stopping if no improvements have been observed for 300 epochs on the validation set. (Appendix A.8)
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions optimizers like AMSGrad, RMSprop, and Adam, but does not specify version numbers for any key software libraries (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	In density modeling experiments, all the models are trained using a batch size of 100 and the AMSGrad optimizer (Reddi et al., 2018) with parameters lr = 0.001, β1 = 0.9, β2 = 0.999 and ϵ = 10 4. (Appendix A.8); We set the gradient norm to 5000 for encoder and {20, 40, 60} for decoder in SUMO. For IWAE, the gradient norm is ﬁxed to 10 in all the experiments. (Appendix A.8)