reproducibilityindex.ai

Routing Networks: Adaptive Selection of Non-Linear Functions for Multi-Task Learning

Authors: Clemens Rosenbaum, Tim Klinger, Matthew Riemer

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model against cross-stitch networks and shared-layer baselines on multi-task settings of the MNIST, mini-imagenet, and CIFAR-100 datasets. Our experiments demonstrate a signiﬁcant improvement in accuracy, with sharper convergence.
Researcher Affiliation	Collaboration	Clemens Rosenbaum College of Information and Computer Sciences University of Massachusetts Amherst 140 Governors Dr., Amherst, MA 01003 cgbr@cs.umass.edu Tim Klinger & Matthew Riemer IBM Research AI 1101 Kitchawan Rd, Yorktown Heights, NY 10598 {tklinger,mdriemer}@us.ibm.com
Pseudocode	Yes	Algorithm 1: Routing Algorithm; Algorithm 2: Router-Trainer: Training of a Routing Network.; Algorithm 3: Weighted Policy Learner
Open Source Code	No	All dataset splits and the code will be released with the publication of this paper.
Open Datasets	Yes	We experiment with three datasets: multi-task versions of MNIST (MNIST-MTL) (Lecun et al., 1998), Mini-Imagenet (MIN-MTL) (Vinyals et al., 2016) as introduced by (Ravi & Larochelle, 2017), and CIFAR-100 (CIFAR-MTL) (Krizhevsky, 2009) where we treat the 20 superclasses as tasks.
Dataset Splits	No	The paper provides training and testing split sizes in Table 1 and within the text for each dataset, but it does not specify a separate validation split with quantitative details.
Hardware Specification	No	The paper mentions 'training time on a stable compute cluster' but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using SGD and Adam optimizers but does not provide specific version numbers for any software libraries, frameworks, or programming languages used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	We use ρ = 0.0 (no collaboration reward) for CIFAR-MTL and MIN-MTL and ρ = 0.3 for MNIST-MTL. The learning rate is initialized to 10^-2 and annealed by dividing by 10 every 20 epochs. We tried both regular SGD as well as Adam Kingma & Ba (2014), but chose SGD as it resulted in marginally better performance. The Simple Conv Net has batch normalization layers but we use no dropout.