Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters
Authors: Yuhang Zhou, Zihua Zhao, Siyuan Du, Haolin Li, Jiangchao Yao, Ya Zhang, Yanfeng Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments to verify the superiority of Mo LA over previous state-of-the-art methods and present indepth analysis on its working mechanism. |
| Researcher Affiliation | Academia | 1Cooperative Medianet Innovation Center, Shanghai Jiao Tong University 2Shanghai Artificial Intelligence Laboratory 3Fudan University. |
| Pseudocode | No | The paper describes methods and processes in text and equations but does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code is available at: https://github.com/MediaBrain-SJTU/MoLA |
| Open Datasets | Yes | For domain heterogeneity, we use VLCS (Torralba & Efros, 2011) and Officehome (Venkateswara et al., 2017) datasets; For Multi-input task heterogeneity, we use Rad Image Net (Mei et al., 2022) and Med MNISTV2 (Yang et al., 2021) datasets; For Single-input task heterogeneity, we use NYUv2 (Silberman et al., 2012). |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide specific details or proportions for validation dataset splits for all experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions using AdamW, SGD, and Adam optimizers with citations but does not specify software dependencies with version numbers like PyTorch or Python versions. |
| Experiment Setup | Yes | We apply the Adam W optimizer (...) with learning rate of 0.0001 for experiments on VLCS and Officehome datasets, SGD optimizer (...) with learning rate of 0.05 on Rad Image Net and Med MNIST datasets and Adam optimizer (...) with learning rate of 0.0001 on NYUv2 dataset. For all of the experiments, the training batch-size is set to 128. |