Dynamic Routing Between Capsules

Authors: Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that a discrimininatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. Training is performed on 28 28 MNIST (Le Cun et al. [1998]) images that have been shifted by up to 2 pixels in each direction with zero padding.
Researcher Affiliation Industry Sara Sabour Nicholas Frosst Geoffrey E. Hinton Google Brain Toronto {sasabour, frosst, geoffhinton}@google.com
Pseudocode Yes Procedure 1 Routing algorithm. 1: procedure ROUTING(ˆuj|i, r, l) 2: for all capsule i in layer l and capsule j in layer (l + 1): bij 0. 3: for r iterations do 4: for all capsule i in layer l: ci softmax(bi) softmax computes Eq. 3 5: for all capsule j in layer (l + 1): sj P i cijˆuj|i 6: for all capsule j in layer (l + 1): vj squash(sj) squash computes Eq. 1 7: for all capsule i in layer l and capsule j in layer (l + 1): bij bij + ˆuj|i.vj return vj
Open Source Code No The paper states 'Our implementation is in Tensor Flow (Abadi et al. [2016])', but it does not provide a link to the source code for their CapsNet implementation or an explicit statement of its availability.
Open Datasets Yes Training is performed on 28 28 MNIST (Le Cun et al. [1998]) images that have been shifted by up to 2 pixels in each direction with zero padding.
Dataset Splits Yes The dataset has 60K and 10K images for training and testing respectively. We also searched for the right decay step on the 10K validation set.
Hardware Specification No No specific hardware details (such as GPU or CPU models) used for running the experiments are mentioned in the paper.
Software Dependencies No The paper mentions 'Tensor Flow (Abadi et al. [2016])' and 'Adam optimizer (Kingma and Ba [2014])' but does not specify version numbers for TensorFlow or any other software dependencies.
Experiment Setup Yes Conv1 has 256, 9 9 convolution kernels with a stride of 1 and Re LU activation. We use the Adam optimizer (Kingma and Ba [2014]) with its Tensor Flow default parameters, including the exponentially decaying learning rate, to minimize the sum of the margin losses in Eq. 4. m+ = 0.9 and m = 0.1. ... We use λ = 0.5. We scale down this reconstruction loss by 0.0005... We suggest 3 iteration of routing for all experiments. The batch size at each training step is 128.