Towards Modular LLMs by Building and Reusing a Library of LoRAs

Authors: Oleksiy Ostapenko, Zhan Su, Edoardo Ponti, Laurent Charlin, Nicolas Le Roux, Lucas Caccia, Alessandro Sordoni

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment with several LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying that MBC-based adapters and Arrow routing lead to superior generalization to new tasks. Thus, we make steps towards creating modular, adaptable LLMs that can match or outperform traditional joint training.
Researcher Affiliation Collaboration 1Microsoft Research 2Mila Quebec AI Institute 3Universit e de Montr eal 4University of Copenhagen 5University of Edinburgh 6HEC Montr eal 7Canada CIFAR AI Chair.
Pseudocode Yes Algorithm 1 Model-Based Clustering (MBC)... Algorithm 2 Arrow Routing
Open Source Code No The paper states: "We acknowledge the support of Matheus Pereira for maintaining and optimizing the code, as well as for preparing the code release." However, it does not provide concrete access, such as a specific repository link, nor does it explicitly state that the code is publicly available at the time of publication.
Open Datasets Yes We train expert modules on 256 tasks from the original Flan v2 dataset (Longpre et al., 2023).
Dataset Splits Yes We threshold the number of training examples to 10,000 examples per task and reserve 1,000 for validation.
Hardware Specification No The paper mentions the LLMs used (Phi-2 and Mistral 7B) but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for training or inference.
Software Dependencies No The paper mentions the use of specific LLM models (Phi-2, Mistral) and parameter-efficient fine-tuning methods like LoRA. While it references the PEFT library in its bibliography, it does not provide specific version numbers for Python, PyTorch, CUDA, or other key software components used in the experiments.
Experiment Setup Yes Unless stated otherwise, for all our multi-task training and single-task adaptation scenarios, we use Lo RA rank of 4, dropout of 0.05 and learning rate of 1e-4. Unless specified, we set the number of clusters for MBC to 10, resulting in the best upstream validation loss and downstream performance for Phi-2, as demonstrated in Fig. 4.