Matrix capsules with EM routing

Authors: Geoffrey E Hinton, Sara Sabour, Nicholas Frosst

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the small NORB benchmark, capsules reduce the number of test errors by 45% compared to the state-of-the-art. Capsules also show far more resistance to white box adversarial attacks than our baseline convolutional neural network.
Researcher Affiliation Industry Geoffrey Hinton, Sara Sabour, Nicholas Frosst Google Brain Toronto, Canada {geoffhinton, sasabour, frosst}@google.com
Pseudocode Yes Procedure 1 Routing algorithm returns activation and pose of the capsules in layer L + 1 given the activations and votes of capsules in layer L.
Open Source Code No The paper does not include an explicit statement about releasing the source code or provide a link to a code repository for the methodology described.
Open Datasets Yes The small NORB dataset (Le Cun et al. (2004)) has gray-level stereo images of 5 classes of toys: airplanes, cars, trucks, humans and animals.
Dataset Splits No The paper explicitly defines training and test sets (e.g., “5 physical instances of a class are selected for the training data and the other 5 for the test data.”) and describes how data is processed during training and testing, but it does not specify a separate “validation” dataset split with percentages or counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running the experiments.
Software Dependencies No The paper mentions “Tensor Flow” in the acknowledgments but does not provide specific version numbers for software dependencies or libraries used.
Experiment Setup Yes We downsample small NORB to 48 48 pixels and normalize each image to have zero mean and unit variance. During training, we randomly crop 32 32 patches and add random brightness and contrast to the cropped images. During test, we crop a 32 32 patch from the center of the image and achieve 1.8% test error on small NORB. [...] The model starts with a 5x5 convolutional layer with 32 channels (A=32) and a stride of 2 with a Re LU non-linearity. All the other layers are capsule layers starting with the primary capsule layer. [...] We use spread loss to directly maximize the gap between the activation of the target class (at) and the activation of the other classes. By starting with a small margin of 0.2 and linearly increasing it during training to 0.9, we avoid dead capsules in the earlier layers.