Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models
Authors: Yucen Luo, Alex Beatson, Mohammad Norouzi, Jun Zhu, David Duvenaud, Ryan P. Adams, Ricky T. Q. Chen
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. (Abstract); We first compare the performance of SUMO when used as a replacement to IWAE with the same expected cost on density modeling tasks. (Section 5) |
| Researcher Affiliation | Collaboration | Yucen Luo Tsinghua University EMAIL; Alex Beatson Princeton University EMAIL; Mohammad Norouzi Google Research EMAIL; Jun Zhu Tsinghua University EMAIL; David Duvenaud University of Toronto EMAIL; Ryan P. Adams Princeton University EMAIL; Ricky T. Q. Chen University of Toronto EMAIL |
| Pseudocode | Yes | Algorithm 1 Computing SUMO, an unbiased estimator of log p(x). |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing the code, nor does it include a link to a code repository. |
| Open Datasets | Yes | We make use of two benchmark datasets: dynamically binarized MNIST (Le Cun et al., 1998) and binarized OMNIGLOT (Lake et al., 2015). |
| Dataset Splits | Yes | The learning rate is reduced by factor 0.8 if the validation likelihood does not improve for 50 epochs. (Appendix A.8); We report the performance of models with early stopping if no improvements have been observed for 300 epochs on the validation set. (Appendix A.8) |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions optimizers like AMSGrad, RMSprop, and Adam, but does not specify version numbers for any key software libraries (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | In density modeling experiments, all the models are trained using a batch size of 100 and the AMSGrad optimizer (Reddi et al., 2018) with parameters lr = 0.001, β1 = 0.9, β2 = 0.999 and ϵ = 10 4. (Appendix A.8); We set the gradient norm to 5000 for encoder and {20, 40, 60} for decoder in SUMO. For IWAE, the gradient norm is fixed to 10 in all the experiments. (Appendix A.8) |