reproducibilityindex.ai

Capsules with Inverted Dot-Product Attention Routing

Authors: Yao-Hung Hubert Tsai, Nitish Srivastava, Hanlin Goh, Ruslan Salakhutdinov

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model achieves comparable performance as the state-of-the-art convolutional neural networks (CNNs), but with much fewer parameters, on CIFAR-10 (95.14% test accuracy) and CIFAR-100 (78.02% test accuracy). We also introduce a challenging task to recognize single and multiple overlapping objects simultaneously. Sections 5 and 6 are titled 'EXPERIMENTS ON CIFAR-10 AND CIFAR-100' and 'EXPERIMENTS ON DIVERSEMULTIMNIST' respectively, detailing empirical evaluations.
Researcher Affiliation	Collaboration	1Apple Inc., 2Carnegie Mellon University
Pseudocode	Yes	Procedure 1 Inverted Dot-product Attention Routing algorithm returns updated poses of the capsules in layer L + 1 given poses in layer L and L + 1 and weights between layer L and L + 1. Procedure 2 Inference. Inference returns class logits given input images and parameters for the model.
Open Source Code	Yes	Our code is publicly available at: https://github. com/apple/ml-capsules-inverted-attention-routing. An alternative implementation is available at: https://github. com/ yaohungt/Capsules-Inverted-Attention-Routing/blob/ master/README.md.
Open Datasets	Yes	CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009) consist of small 32 32 real-world color images with 50, 000 for training and 10, 000 for evaluation. To this end, we construct the Diverse Multi MNIST dataset which is extended from MNIST (Le Cun et al., 1998).
Dataset Splits	No	The paper specifies '50,000 for training and 10,000 for evaluation' for CIFAR-10/100, and '10,000 test images' for Diverse Multi MNIST, but does not explicitly define a separate 'validation' split or a three-way split with counts/percentages for all datasets.
Hardware Specification	No	The paper states 'All the model is trained on a 8-GPU machine with batch size 128.' and 'For fairness, the numbers are benchmarked using the same 8-GPU machine with batch size 128.', but does not specify the type or model of the GPUs (e.g., NVIDIA A100, Tesla V100).
Software Dependencies	No	The paper mentions optimizers like 'Adam (Kingma & Ba, 2014)' and 'stochastic gradient descent' but does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., PyTorch, TensorFlow, or specific Adam implementation versions).
Experiment Setup	Yes	For the optimizers, we use stochastic gradient descent with learning rate 0.1... We use Adam... with learning rate 0.001... We decrease the learning rate by 10 times when the model trained on 150 epochs and 250 epochs, and there are 350 epochs in total. All the model is trained on a 8-GPU machine with batch size 128. During training, we ﬁrst pad four zero-value pixels to each image and randomly crop the image to the size 32 32. Then, we horizontally ﬂip the image with probability 0.5. Detailed model specifications are provided in Tables 2-12.