Batched Low-Rank Adaptation of Foundation Models

Authors: Yeming Wen, Swarat Chaudhuri

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that FLORA retains the performance merits of LORA, showcasing competitive results on the Multi PL-E code generation benchmark spanning over 8 languages and a multilingual speech recognition task across 6 languages.
Researcher Affiliation Academia Yeming Wen & Swarat Chaudhuri Department of Computer Science The University of Texas at Austin ywen@utexas.edu, swarat@cs.utexas.edu
Pseudocode Yes We illustrate the implementation of torch.bmm as follows. Upon computing the adaptersout, it can be added to the output of the standard LLM layer. This method facilitates the computation of diverse adapters outputs within a batch. inputs = torch.randn(batch, seq_length, d_model) adapter_b # shape of (batch, d_model, rank) adapter_a # shape of (batch, rank, d_model) hidden = torch.bmm(inputs, adapter_b) adaptersout = torch.bmm(hidden, adapter_a)
Open Source Code No The paper references a third-party library, PEFT (https://github.com/huggingface/peft/tree/main), which was used in their experiments, but does not provide a direct link or explicit statement for the open-sourcing of their own FLORA implementation.
Open Datasets Yes The dataset facilitating this analysis has been sourced from the v LLM throughput benchmark. Noteworthily, this dataset was previously used to fine-tune the English Vicuna model, a state-of-the-art chat LLMs (Chiang et al., 2023)., We leveraged the same pre-training data that was used for pre-training Star Coder, specifically, the Stack dataset, and We use the Common Voice benchmark (Ardila et al., 2020) containing a total of 38 languages and 2,500 hours of collected audio for the fine-tuning process, with a particular focus on low-resource languages.4 For each low-resource language enumerated in 4.2, we fine-tuned on its training split within the Common Voice dataset... Dataset statistics can be found here https://commonvoice.mozilla.org/en/datasets.
Dataset Splits No The paper describes training on datasets and evaluating on test benchmarks (e.g., “Human Eval split”), but does not explicitly provide details about a validation set or its split information (percentages, counts, or specific methodology).
Hardware Specification Yes All experiments were conducted on an NVIDIA H100 GPU with a float16 precision3.
Software Dependencies Yes The v LLM framework (Kwon et al., 2023)2, with its implementation of continuous batching, presents an ideal setup for this analysis. 2The version is 0.1.3. and The batch matrix multiplication (BMM) can be implemented using the torch.bmm operator in deep learning frameworks such as Py Torch (Paszke et al., 2019).
Experiment Setup Yes For each low-resource language in our experiment, we fine-tuned on its corresponding split from the Stack dataset for a total of 1,500 steps, along with batch size 8. and sampling temperature set at 0.1. and For LORA, a learning rate of 1e 4 sufficed, whereas IA3 required 5e 3, and FLORA demanded an even higher rate of 8e 3. and All models were fine-tuned using 8 bit quantization featuring lower training memory cost.