Mixture of LoRA Experts
Authors: Xun Wu, Shaohan Huang, Furu Wei
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted in both Natural Language Processing (NLP) and Vision & Language (V&L) domains validate the effects of MOLE. |
| Researcher Affiliation | Collaboration | 1Microsoft Research Asia 2Tsinghua Univeristy |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there any structured code-like blocks describing a procedure. |
| Open Source Code | Yes | Our code are available at https://github.com/yushuiwx/Mo LE.git. |
| Open Datasets | Yes | We conducted extensive experiments across various tasks, including Translation, Natural Language Inference (NLI), Struct to Text, Closed-Book QA, and multiple subtasks within the Big-Bench Hard (BBH) (Ghazal et al., 2013) dataset. We trained a single Lo RA on a combined dataset comprising ANLI-R1 (Nie et al., 2019), ANLI-R2 (Nie et al., 2019), and QNLI (Rajpurkar et al., 2018) datasets, as depicted in Table 5. |
| Dataset Splits | No | The paper describes training parameters such as learning rate, batch size, and iterations, and mentions 'test' sets for evaluation, but it does not explicitly specify a 'validation set' or a dedicated 'validation split' with specific percentages or counts for data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions models and frameworks like 'Stable Diffusion V2.1' and 'Flan-T5', but it does not list any specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | During training MOLE, we process the image resolution to 512 × 512 and set learning rate as 1e-5. We use DDPM sampler (Ho et al., 2020) with 50 steps in each case and train 400 iterations for each required composition with batch size 2 and α as 0.5. |