Capsule Routing via Variational Bayes

Authors: Fabio De Sousa Ribeiro, Georgios Leontidis, Stefanos Kollias3749-3756

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We outperform the state-of-the-art on small NORB using 50% fewer capsules than previously reported, achieve competitive performances on CIFAR-10, Fashion MNIST, SVHN, and demonstrate significant improvement in MNIST to aff NIST generalisation over previous works. 4 Experiments
Researcher Affiliation Academia Fabio De Sousa Ribeiro, Georgios Leontidis, Stefanos Kollias Machine Learning Group School of Computer Science, University of Lincoln, UK {fdesousaribeiro, gleontidis, skollias}@lincoln.ac.uk
Pseudocode Yes Algorithm 1 Variational Bayes Capsule Routing
Open Source Code Yes 1https://github.com/fabio-deep/Variational-Capsule-Routing
Open Datasets Yes The main comparative results are reported in Table 1, using small NORB (Le Cun et al. 2004), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), SVHN (Netzer et al. 2011) and CIFAR-10 (Krizhevsky, Hinton, and others 2009).
Dataset Splits Yes A 20% validation split of the training set was used to tune hyperparameters. During training, we validated using the portion of test data containing the same viewpoints as in training and measured the generalisation to novel viewpoints after matching the performance on familiar ones.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., library names with explicit version numbers like 'PyTorch 1.9' or 'CUDA 11.1').
Experiment Setup Yes In all cases, we use the diagonal parameterisation in Eq. (11), 3 VB routing iters and batch size 32. All hyperparameters were tuned using validation sets, then models were retrained with the full training set until convergence before testing. Our best model {64, 8, 16, 16, 5} was trained for 350 epochs using Adam, LNLL loss, and 3e-3 initial learning rate with exponentially decay.