reproducibilityindex.ai

Multi-Head Adapter Routing for Cross-Task Generalization

Authors: Lucas Page-Caccia, Edoardo Maria Ponti, Zhan Su, Matheus Pereira, Nicolas Le Roux, Alessandro Sordoni

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate MHR and a series of competitive baselines for few-shot task adaptation on the T0 task suite [Sanh et al., 2022] and Super-Natural Instructions [Super NI; Wang et al., 2022a]. Based on our results, we report that MHR outperforms Poly and single adapter baselines.Our experimental evaluation aims to answer three research questions: 1) Does the expressivity of the routing function matter? 2) Why do routing-based PEFT methods yield superior performance? 3) Is routing useful during both multi-task pre-training and few-shot adaptation?
Researcher Affiliation	Collaboration	Microsoft Research, Mc Gill University, MILA, University of Edinburgh, Université de Montréal, University of Copenhagen
Pseudocode	No	The paper describes methods using mathematical formulas and text, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper references a GitHub link (https://github.com/r-three/t-few) for a baseline (T-Few) used in their experiments, but does not provide a link or statement for the open-sourcing of their own method's code (MHR, MHR-z, MHR-µ).
Open Datasets	Yes	We test our methods on the T0 Sanh et al. [2022] evaluation suite, following the same setup as Liu et al. [2022], and Super NI Wang et al. [2022a], a large-scale dataset with more than 1,600 training tasks.
Dataset Splits	Yes	We report the median and standard deviation of the best validation accuracy for each test task across 3 seeds, when evaluated every 50 training epochs.For every method, we perform early stopping on the validation set. Tasks were chosen at random, with the requirement that at least 300 examples were available, and were equally split into 100 training, 100 validation and 100 test examples.
Hardware Specification	Yes	1We note that all experiments were run on a single NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions models like T5 and T0, and frameworks like Lo RA and (IA)3, but does not specify version numbers for any software dependencies used for implementation (e.g., PyTorch version, TensorFlow version, specific library versions).
Experiment Setup	Yes	We report the median and standard deviation of the best validation accuracy for each test task across 3 seeds, when evaluated every 50 training epochs. Tasks were chosen at random, with the requirement that at least 300 examples were available, and were equally split into 100 training, 100 validation and 100 test examples.