Matrix capsules with EM routing
Authors: Geoffrey E Hinton, Sara Sabour, Nicholas Frosst
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the small NORB benchmark, capsules reduce the number of test errors by 45% compared to the state-of-the-art. Capsules also show far more resistance to white box adversarial attacks than our baseline convolutional neural network. |
| Researcher Affiliation | Industry | Geoffrey Hinton, Sara Sabour, Nicholas Frosst Google Brain Toronto, Canada {geoffhinton, sasabour, frosst}@google.com |
| Pseudocode | Yes | Procedure 1 Routing algorithm returns activation and pose of the capsules in layer L + 1 given the activations and votes of capsules in layer L. |
| Open Source Code | No | The paper does not include an explicit statement about releasing the source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | The small NORB dataset (Le Cun et al. (2004)) has gray-level stereo images of 5 classes of toys: airplanes, cars, trucks, humans and animals. |
| Dataset Splits | No | The paper explicitly defines training and test sets (e.g., “5 physical instances of a class are selected for the training data and the other 5 for the test data.”) and describes how data is processed during training and testing, but it does not specify a separate “validation” dataset split with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions “Tensor Flow” in the acknowledgments but does not provide specific version numbers for software dependencies or libraries used. |
| Experiment Setup | Yes | We downsample small NORB to 48 48 pixels and normalize each image to have zero mean and unit variance. During training, we randomly crop 32 32 patches and add random brightness and contrast to the cropped images. During test, we crop a 32 32 patch from the center of the image and achieve 1.8% test error on small NORB. [...] The model starts with a 5x5 convolutional layer with 32 channels (A=32) and a stride of 2 with a Re LU non-linearity. All the other layers are capsule layers starting with the primary capsule layer. [...] We use spread loss to directly maximize the gap between the activation of the target class (at) and the activation of the other classes. By starting with a small margin of 0.2 and linearly increasing it during training to 0.9, we avoid dead capsules in the earlier layers. |