Dynamic Routing Between Capsules
Authors: Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that a discrimininatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. Training is performed on 28 28 MNIST (Le Cun et al. [1998]) images that have been shifted by up to 2 pixels in each direction with zero padding. |
| Researcher Affiliation | Industry | Sara Sabour Nicholas Frosst Geoffrey E. Hinton Google Brain Toronto {sasabour, frosst, geoffhinton}@google.com |
| Pseudocode | Yes | Procedure 1 Routing algorithm. 1: procedure ROUTING(ˆuj|i, r, l) 2: for all capsule i in layer l and capsule j in layer (l + 1): bij 0. 3: for r iterations do 4: for all capsule i in layer l: ci softmax(bi) softmax computes Eq. 3 5: for all capsule j in layer (l + 1): sj P i cijˆuj|i 6: for all capsule j in layer (l + 1): vj squash(sj) squash computes Eq. 1 7: for all capsule i in layer l and capsule j in layer (l + 1): bij bij + ˆuj|i.vj return vj |
| Open Source Code | No | The paper states 'Our implementation is in Tensor Flow (Abadi et al. [2016])', but it does not provide a link to the source code for their CapsNet implementation or an explicit statement of its availability. |
| Open Datasets | Yes | Training is performed on 28 28 MNIST (Le Cun et al. [1998]) images that have been shifted by up to 2 pixels in each direction with zero padding. |
| Dataset Splits | Yes | The dataset has 60K and 10K images for training and testing respectively. We also searched for the right decay step on the 10K validation set. |
| Hardware Specification | No | No specific hardware details (such as GPU or CPU models) used for running the experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions 'Tensor Flow (Abadi et al. [2016])' and 'Adam optimizer (Kingma and Ba [2014])' but does not specify version numbers for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | Conv1 has 256, 9 9 convolution kernels with a stride of 1 and Re LU activation. We use the Adam optimizer (Kingma and Ba [2014]) with its Tensor Flow default parameters, including the exponentially decaying learning rate, to minimize the sum of the margin losses in Eq. 4. m+ = 0.9 and m = 0.1. ... We use λ = 0.5. We scale down this reconstruction loss by 0.0005... We suggest 3 iteration of routing for all experiments. The batch size at each training step is 128. |