Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters

Authors: Yuhang Zhou, Zihua Zhao, Siyuan Du, Haolin Li, Jiangchao Yao, Ya Zhang, Yanfeng Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments to verify the superiority of Mo LA over previous state-of-the-art methods and present indepth analysis on its working mechanism.
Researcher Affiliation Academia 1Cooperative Medianet Innovation Center, Shanghai Jiao Tong University 2Shanghai Artificial Intelligence Laboratory 3Fudan University.
Pseudocode No The paper describes methods and processes in text and equations but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Source code is available at: https://github.com/MediaBrain-SJTU/MoLA
Open Datasets Yes For domain heterogeneity, we use VLCS (Torralba & Efros, 2011) and Officehome (Venkateswara et al., 2017) datasets; For Multi-input task heterogeneity, we use Rad Image Net (Mei et al., 2022) and Med MNISTV2 (Yang et al., 2021) datasets; For Single-input task heterogeneity, we use NYUv2 (Silberman et al., 2012).
Dataset Splits No The paper mentions training and testing but does not explicitly provide specific details or proportions for validation dataset splits for all experiments.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions using AdamW, SGD, and Adam optimizers with citations but does not specify software dependencies with version numbers like PyTorch or Python versions.
Experiment Setup Yes We apply the Adam W optimizer (...) with learning rate of 0.0001 for experiments on VLCS and Officehome datasets, SGD optimizer (...) with learning rate of 0.05 on Rad Image Net and Med MNIST datasets and Adam optimizer (...) with learning rate of 0.0001 on NYUv2 dataset. For all of the experiments, the training batch-size is set to 128.