Towards Modular LLMs by Building and Reusing a Library of LoRAs
Authors: Oleksiy Ostapenko, Zhan Su, Edoardo Ponti, Laurent Charlin, Nicolas Le Roux, Lucas Caccia, Alessandro Sordoni
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment with several LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying that MBC-based adapters and Arrow routing lead to superior generalization to new tasks. Thus, we make steps towards creating modular, adaptable LLMs that can match or outperform traditional joint training. |
| Researcher Affiliation | Collaboration | 1Microsoft Research 2Mila Quebec AI Institute 3Universit e de Montr eal 4University of Copenhagen 5University of Edinburgh 6HEC Montr eal 7Canada CIFAR AI Chair. |
| Pseudocode | Yes | Algorithm 1 Model-Based Clustering (MBC)... Algorithm 2 Arrow Routing |
| Open Source Code | No | The paper states: "We acknowledge the support of Matheus Pereira for maintaining and optimizing the code, as well as for preparing the code release." However, it does not provide concrete access, such as a specific repository link, nor does it explicitly state that the code is publicly available at the time of publication. |
| Open Datasets | Yes | We train expert modules on 256 tasks from the original Flan v2 dataset (Longpre et al., 2023). |
| Dataset Splits | Yes | We threshold the number of training examples to 10,000 examples per task and reserve 1,000 for validation. |
| Hardware Specification | No | The paper mentions the LLMs used (Phi-2 and Mistral 7B) but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for training or inference. |
| Software Dependencies | No | The paper mentions the use of specific LLM models (Phi-2, Mistral) and parameter-efficient fine-tuning methods like LoRA. While it references the PEFT library in its bibliography, it does not provide specific version numbers for Python, PyTorch, CUDA, or other key software components used in the experiments. |
| Experiment Setup | Yes | Unless stated otherwise, for all our multi-task training and single-task adaptation scenarios, we use Lo RA rank of 4, dropout of 0.05 and learning rate of 1e-4. Unless specified, we set the number of clusters for MBC to 10, resulting in the best upstream validation loss and downstream performance for Phi-2, as demonstrated in Fig. 4. |