Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Variational Russian Roulette for Deep Bayesian Nonparametrics
Authors: Kai Xu, Akash Srivastava, Charles Sutton
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the RRS-IBP method, we compare it with two previous amortized inference approaches for IBP models: MF-IBP (Chatzis, 2014) and S-IBP (Singh et al., 2017). Following Singh et al. (2017), a multiplier greater than 1 is put on the KL term for ν during training for structured variational methods, to encourage adhering to the IBP prior. See Appendix E for other training details. Source code of the implementation of our method is also available at https://github.com/xukai92/RAVE.jl. |
| Researcher Affiliation | Collaboration | 1School of Informatics, University of Edinburgh, Edinburgh, United Kingdom 2MIT-IBM Watson AI Lab, Cambridge, MA, United States 3Google AI, Mountain View, CA, United States 4Alan Turing Institute, London, United Kingdom. |
| Pseudocode | Yes | Algorithm 1 Sampling the truncation level τ during variational optimization, with lazy parameter initialization. |
| Open Source Code | Yes | Source code of the implementation of our method is also available at https://github.com/xukai92/RAVE.jl. |
| Open Datasets | Yes | Now we compare the inference methods on benchmark image data sets, namely the MNIST dataset of handwritten digits (Le Cun et al., 1998) and Fashion-MNIST, a dataset of images of products (Xiao et al., 2017). |
| Dataset Splits | No | The paper mentions 'training' and 'testing' splits (e.g., 'We sample 2,400 images for training and 400 held-out images for testing.'), but does not specify a separate 'validation' split with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running experiments. |
| Software Dependencies | No | The paper mentions techniques like 'Kumaraswamy reparameterization' and 'Concrete reparameterization' but does not specify software names with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). |
| Experiment Setup | Yes | For MF-IBP and S-IBP, the truncation level is set to 9, greater than the number of true features in the data. [...] We use a deep decoder, where the prior is p(A) = N(A; 0, 1), and a Bernoulli observation distribution. Both the encoder and the decoder are two layer neural networks with 500 hidden units and Re LU activation function. [...] for that simpler data, we use hidden layer of size 50 and a Gaussian observation distribution. |