Multi-Head Adapter Routing for Cross-Task Generalization
Authors: Lucas Page-Caccia, Edoardo Maria Ponti, Zhan Su, Matheus Pereira, Nicolas Le Roux, Alessandro Sordoni
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate MHR and a series of competitive baselines for few-shot task adaptation on the T0 task suite [Sanh et al., 2022] and Super-Natural Instructions [Super NI; Wang et al., 2022a]. Based on our results, we report that MHR outperforms Poly and single adapter baselines.Our experimental evaluation aims to answer three research questions: 1) Does the expressivity of the routing function matter? 2) Why do routing-based PEFT methods yield superior performance? 3) Is routing useful during both multi-task pre-training and few-shot adaptation? |
| Researcher Affiliation | Collaboration | Microsoft Research, Mc Gill University, MILA, University of Edinburgh, Université de Montréal, University of Copenhagen |
| Pseudocode | No | The paper describes methods using mathematical formulas and text, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper references a GitHub link (https://github.com/r-three/t-few) for a baseline (T-Few) used in their experiments, but does not provide a link or statement for the open-sourcing of their own method's code (MHR, MHR-z, MHR-µ). |
| Open Datasets | Yes | We test our methods on the T0 Sanh et al. [2022] evaluation suite, following the same setup as Liu et al. [2022], and Super NI Wang et al. [2022a], a large-scale dataset with more than 1,600 training tasks. |
| Dataset Splits | Yes | We report the median and standard deviation of the best validation accuracy for each test task across 3 seeds, when evaluated every 50 training epochs.For every method, we perform early stopping on the validation set. Tasks were chosen at random, with the requirement that at least 300 examples were available, and were equally split into 100 training, 100 validation and 100 test examples. |
| Hardware Specification | Yes | 1We note that all experiments were run on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions models like T5 and T0, and frameworks like Lo RA and (IA)3, but does not specify version numbers for any software dependencies used for implementation (e.g., PyTorch version, TensorFlow version, specific library versions). |
| Experiment Setup | Yes | We report the median and standard deviation of the best validation accuracy for each test task across 3 seeds, when evaluated every 50 training epochs. Tasks were chosen at random, with the requirement that at least 300 examples were available, and were equally split into 100 training, 100 validation and 100 test examples. |