Variational Russian Roulette for Deep Bayesian Nonparametrics
Authors: Kai Xu, Akash Srivastava, Charles Sutton
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the RRS-IBP method, we compare it with two previous amortized inference approaches for IBP models: MF-IBP (Chatzis, 2014) and S-IBP (Singh et al., 2017). Following Singh et al. (2017), a multiplier greater than 1 is put on the KL term for ν during training for structured variational methods, to encourage adhering to the IBP prior. See Appendix E for other training details. Source code of the implementation of our method is also available at https://github.com/xukai92/RAVE.jl. |
| Researcher Affiliation | Collaboration | 1School of Informatics, University of Edinburgh, Edinburgh, United Kingdom 2MIT-IBM Watson AI Lab, Cambridge, MA, United States 3Google AI, Mountain View, CA, United States 4Alan Turing Institute, London, United Kingdom. |
| Pseudocode | Yes | Algorithm 1 Sampling the truncation level τ during variational optimization, with lazy parameter initialization. |
| Open Source Code | Yes | Source code of the implementation of our method is also available at https://github.com/xukai92/RAVE.jl. |
| Open Datasets | Yes | Now we compare the inference methods on benchmark image data sets, namely the MNIST dataset of handwritten digits (Le Cun et al., 1998) and Fashion-MNIST, a dataset of images of products (Xiao et al., 2017). |
| Dataset Splits | No | The paper mentions 'training' and 'testing' splits (e.g., 'We sample 2,400 images for training and 400 held-out images for testing.'), but does not specify a separate 'validation' split with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running experiments. |
| Software Dependencies | No | The paper mentions techniques like 'Kumaraswamy reparameterization' and 'Concrete reparameterization' but does not specify software names with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). |
| Experiment Setup | Yes | For MF-IBP and S-IBP, the truncation level is set to 9, greater than the number of true features in the data. [...] We use a deep decoder, where the prior is p(A) = N(A; 0, 1), and a Bernoulli observation distribution. Both the encoder and the decoder are two layer neural networks with 500 hidden units and Re LU activation function. [...] for that simpler data, we use hidden layer of size 50 and a Gaussian observation distribution. |