Diversity and Depth in Per-Example Routing Models
Authors: Prajit Ramachandran, Quoc V. Le
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we find that adding architectural diversity to routing models significantly improves performance, cutting the error rates of a strong baseline by 35% on an Omniglot setup. However, when scaling up routing depth, we find that modern routing techniques struggle with optimization. |
| Researcher Affiliation | Industry | Prajit Ramachandran Google Brain prajit@google.com Quoc V. Le Google Brain qvl@google.com |
| Pseudocode | No | The paper provides mathematical formulas and descriptions of processes, but no formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to repositories or mention code in supplementary materials. |
| Open Datasets | Yes | Next, we benchmark routing models with architectural diversity on an Omniglot (Lake et al., 2015) multi-task learning setup. |
| Dataset Splits | Yes | We follow Liang et al. (2018) by defining a 50%/20%/30% training/validation/test split and using a fixed random subset of 20 alphabets. |
| Hardware Specification | No | The paper only states 'on a single GPU' without providing specific details like the GPU model, CPU, or memory, which are necessary for hardware reproducibility. |
| Software Dependencies | No | The paper mentions optimizers (Adam) and normalization techniques (Group Norm, ReLU) but does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow/PyTorch versions, library versions). |
| Experiment Setup | Yes | k is annealed from 7 to 2 over the layers. We found the k-annealing technique crucial to prevent overfitting. The Adam optimizer (Kingma & Ba, 2014) is used, and the expert-balancing for noisy top-k loss is annealed from 0.1 to 0 over the course of training. |