Capsule Routing via Variational Bayes
Authors: Fabio De Sousa Ribeiro, Georgios Leontidis, Stefanos Kollias3749-3756
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We outperform the state-of-the-art on small NORB using 50% fewer capsules than previously reported, achieve competitive performances on CIFAR-10, Fashion MNIST, SVHN, and demonstrate significant improvement in MNIST to aff NIST generalisation over previous works. 4 Experiments |
| Researcher Affiliation | Academia | Fabio De Sousa Ribeiro, Georgios Leontidis, Stefanos Kollias Machine Learning Group School of Computer Science, University of Lincoln, UK {fdesousaribeiro, gleontidis, skollias}@lincoln.ac.uk |
| Pseudocode | Yes | Algorithm 1 Variational Bayes Capsule Routing |
| Open Source Code | Yes | 1https://github.com/fabio-deep/Variational-Capsule-Routing |
| Open Datasets | Yes | The main comparative results are reported in Table 1, using small NORB (Le Cun et al. 2004), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), SVHN (Netzer et al. 2011) and CIFAR-10 (Krizhevsky, Hinton, and others 2009). |
| Dataset Splits | Yes | A 20% validation split of the training set was used to tune hyperparameters. During training, we validated using the portion of test data containing the same viewpoints as in training and measured the generalisation to novel viewpoints after matching the performance on familiar ones. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., library names with explicit version numbers like 'PyTorch 1.9' or 'CUDA 11.1'). |
| Experiment Setup | Yes | In all cases, we use the diagonal parameterisation in Eq. (11), 3 VB routing iters and batch size 32. All hyperparameters were tuned using validation sets, then models were retrained with the full training set until convergence before testing. Our best model {64, 8, 16, 16, 5} was trained for 350 epochs using Adam, LNLL loss, and 3e-3 initial learning rate with exponentially decay. |