reproducibilityindex.ai

Diversity and Depth in Per-Example Routing Models

Authors: Prajit Ramachandran, Quoc V. Le

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we ﬁnd that adding architectural diversity to routing models signiﬁcantly improves performance, cutting the error rates of a strong baseline by 35% on an Omniglot setup. However, when scaling up routing depth, we ﬁnd that modern routing techniques struggle with optimization.
Researcher Affiliation	Industry	Prajit Ramachandran Google Brain prajit@google.com Quoc V. Le Google Brain qvl@google.com
Pseudocode	No	The paper provides mathematical formulas and descriptions of processes, but no formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to repositories or mention code in supplementary materials.
Open Datasets	Yes	Next, we benchmark routing models with architectural diversity on an Omniglot (Lake et al., 2015) multi-task learning setup.
Dataset Splits	Yes	We follow Liang et al. (2018) by deﬁning a 50%/20%/30% training/validation/test split and using a ﬁxed random subset of 20 alphabets.
Hardware Specification	No	The paper only states 'on a single GPU' without providing specific details like the GPU model, CPU, or memory, which are necessary for hardware reproducibility.
Software Dependencies	No	The paper mentions optimizers (Adam) and normalization techniques (Group Norm, ReLU) but does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow/PyTorch versions, library versions).
Experiment Setup	Yes	k is annealed from 7 to 2 over the layers. We found the k-annealing technique crucial to prevent overﬁtting. The Adam optimizer (Kingma & Ba, 2014) is used, and the expert-balancing for noisy top-k loss is annealed from 0.1 to 0 over the course of training.