Variational Russian Roulette for Deep Bayesian Nonparametrics

Authors: Kai Xu, Akash Srivastava, Charles Sutton

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the RRS-IBP method, we compare it with two previous amortized inference approaches for IBP models: MF-IBP (Chatzis, 2014) and S-IBP (Singh et al., 2017). Following Singh et al. (2017), a multiplier greater than 1 is put on the KL term for ν during training for structured variational methods, to encourage adhering to the IBP prior. See Appendix E for other training details. Source code of the implementation of our method is also available at https://github.com/xukai92/RAVE.jl.
Researcher Affiliation Collaboration 1School of Informatics, University of Edinburgh, Edinburgh, United Kingdom 2MIT-IBM Watson AI Lab, Cambridge, MA, United States 3Google AI, Mountain View, CA, United States 4Alan Turing Institute, London, United Kingdom.
Pseudocode Yes Algorithm 1 Sampling the truncation level τ during variational optimization, with lazy parameter initialization.
Open Source Code Yes Source code of the implementation of our method is also available at https://github.com/xukai92/RAVE.jl.
Open Datasets Yes Now we compare the inference methods on benchmark image data sets, namely the MNIST dataset of handwritten digits (Le Cun et al., 1998) and Fashion-MNIST, a dataset of images of products (Xiao et al., 2017).
Dataset Splits No The paper mentions 'training' and 'testing' splits (e.g., 'We sample 2,400 images for training and 400 held-out images for testing.'), but does not specify a separate 'validation' split with percentages or counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running experiments.
Software Dependencies No The paper mentions techniques like 'Kumaraswamy reparameterization' and 'Concrete reparameterization' but does not specify software names with version numbers (e.g., 'Python 3.8, PyTorch 1.9').
Experiment Setup Yes For MF-IBP and S-IBP, the truncation level is set to 9, greater than the number of true features in the data. [...] We use a deep decoder, where the prior is p(A) = N(A; 0, 1), and a Bernoulli observation distribution. Both the encoder and the decoder are two layer neural networks with 500 hidden units and Re LU activation function. [...] for that simpler data, we use hidden layer of size 50 and a Gaussian observation distribution.