reproducibilityindex.ai

Deciding How to Decide: Dynamic Routing in Artificial Neural Networks

Authors: Mason McGill, Pietro Perona

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose three approaches to training these networks, test them on small image datasets synthesized from MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky & Hinton, 2009), and quantify the accuracy/efﬁciency trade-off that occurs when the network parameters are tuned to yield more aggressive early classiﬁcation policies. We compare approaches to dynamic routing by training 153 networks to classify small images, varying the policy-learning strategy, regularization strategy, optimization strategy, architecture, cost of computation, and details of the task. The results of these experiments are reported in Fig. 5 10.
Researcher Affiliation	Academia	Mason Mc Gill 1 Pietro Perona 1 1California Institute of Technology, Pasadena, California, USA.
Pseudocode	No	The paper describes its methods verbally and mathematically but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available via Git Lab.
Open Datasets	Yes	we train networks to classify images from a small-image dataset synthesized from MNIST (Le Cun et al., 1998) and CIFAR10 (Krizhevsky & Hinton, 2009)
Dataset Splits	No	The paper mentions training iterations, mini-batch size, and the use of validation images, but does not provide specific train/validation/test dataset split percentages or counts.
Hardware Specification	No	The paper discusses computational cost and efficiency but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies	No	The paper mentions various techniques like batch normalization and Xavier initialization, but does not list any specific software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	In all of our experiments, we use a mini-batch size, nex, of 128, and run 80,000 training iterations. We perform stochastic gradient descent with initial learning rate 0.1/nex and momentum 0.9. The learning rate decays continuously with a half-life of 10,000 iterations. [...] τ is initialized to 1.0 for actor networks and 0.1 for critic networks, and decays with a half-life of 10,000 iterations. kdec = 0.01, kure = 0.001, and k L2 = 1 10 4.