SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models
Authors: Yucen Luo, Alex Beatson, Mohammad Norouzi, Jun Zhu, David Duvenaud, Ryan P. Adams, Ricky T. Q. Chen
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. (Abstract); We first compare the performance of SUMO when used as a replacement to IWAE with the same expected cost on density modeling tasks. (Section 5) |
| Researcher Affiliation | Collaboration | Yucen Luo Tsinghua University luoyc15@mails.tsinghua.edu.cn; Alex Beatson Princeton University abeatson@cs.princeton.edu; Mohammad Norouzi Google Research mnorouzi@google.com; Jun Zhu Tsinghua University dcszj@tsinghua.edu.cn; David Duvenaud University of Toronto duvenaud@cs.toronto.edu; Ryan P. Adams Princeton University rpa@princeton.edu; Ricky T. Q. Chen University of Toronto rtqichen@cs.toronto.edu |
| Pseudocode | Yes | Algorithm 1 Computing SUMO, an unbiased estimator of log p(x). |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing the code, nor does it include a link to a code repository. |
| Open Datasets | Yes | We make use of two benchmark datasets: dynamically binarized MNIST (Le Cun et al., 1998) and binarized OMNIGLOT (Lake et al., 2015). |
| Dataset Splits | Yes | The learning rate is reduced by factor 0.8 if the validation likelihood does not improve for 50 epochs. (Appendix A.8); We report the performance of models with early stopping if no improvements have been observed for 300 epochs on the validation set. (Appendix A.8) |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions optimizers like AMSGrad, RMSprop, and Adam, but does not specify version numbers for any key software libraries (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | In density modeling experiments, all the models are trained using a batch size of 100 and the AMSGrad optimizer (Reddi et al., 2018) with parameters lr = 0.001, β1 = 0.9, β2 = 0.999 and ϵ = 10 4. (Appendix A.8); We set the gradient norm to 5000 for encoder and {20, 40, 60} for decoder in SUMO. For IWAE, the gradient norm is fixed to 10 in all the experiments. (Appendix A.8) |