Mixture of LoRA Experts

Authors: Xun Wu, Shaohan Huang, Furu Wei

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted in both Natural Language Processing (NLP) and Vision & Language (V&L) domains validate the effects of MOLE.
Researcher Affiliation Collaboration 1Microsoft Research Asia 2Tsinghua Univeristy
Pseudocode No The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there any structured code-like blocks describing a procedure.
Open Source Code Yes Our code are available at https://github.com/yushuiwx/Mo LE.git.
Open Datasets Yes We conducted extensive experiments across various tasks, including Translation, Natural Language Inference (NLI), Struct to Text, Closed-Book QA, and multiple subtasks within the Big-Bench Hard (BBH) (Ghazal et al., 2013) dataset. We trained a single Lo RA on a combined dataset comprising ANLI-R1 (Nie et al., 2019), ANLI-R2 (Nie et al., 2019), and QNLI (Rajpurkar et al., 2018) datasets, as depicted in Table 5.
Dataset Splits No The paper describes training parameters such as learning rate, batch size, and iterations, and mentions 'test' sets for evaluation, but it does not explicitly specify a 'validation set' or a dedicated 'validation split' with specific percentages or counts for data partitioning.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions models and frameworks like 'Stable Diffusion V2.1' and 'Flan-T5', but it does not list any specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes During training MOLE, we process the image resolution to 512 × 512 and set learning rate as 1e-5. We use DDPM sampler (Ho et al., 2020) with 50 steps in each case and train 400 iterations for each required composition with batch size 2 and α as 0.5.