3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability
Authors: Baohao Liao, Christof Monz
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To assess the efficacy of Ro Ad, we perform comprehensive evaluations on the GLUE benchmark [56], eight commonsense reasoning tasks and four arithmetic reasoning tasks, utilizing Ro BERTa [31] and LLa MA [52, 53] (Section 4.1). The results consistently show that Ro Ad surpasses other PEFT methods while maintaining a significantly reduced scale of trainable parameters (< 0.1%), as depicted in Figure 1. |
| Researcher Affiliation | Collaboration | Baohao Liao1,2 Christof Monz1 1Language Technology Lab, University of Amsterdam 2e Bay Inc., Aachen, Germany |
| Pseudocode | No | The paper describes the method using mathematical equations and textual explanations, including an overview diagram (Figure 3), but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Code: https://github.com/Baohao Liao/road |
| Open Datasets | Yes | To assess the efficacy of Ro Ad, we perform comprehensive evaluations on the GLUE benchmark [56], eight commonsense reasoning tasks and four arithmetic reasoning tasks, utilizing Ro BERTa [31] and LLa MA [52, 53] (Section 4.1). |
| Dataset Splits | Yes | Unlike many previous works [14, 22, 23, 31, 65] that employ the GLUE development sets for both validation and testing, here we partition the development set into distinct validation and test subsets to mitigate the risk of overfitting. For comprehensive information regarding the split of the development set, the search space of hyperparameters, the optimal hyperparameter configurations, and other details crucial for reproducibility, please see Section C.1. |
| Hardware Specification | Yes | All of our experiments are conducted on A100 80GB GPU with the frameworks, Transformers [59] and PEFT [34]. |
| Software Dependencies | No | The paper mentions using 'Transformers [59] and PEFT [34]' frameworks, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Hyperparameter tuning. We mainly follow the hyperparameter search space of Liao et al. [22] and list them in Table C.2. Notably, we almost upscale the learning rate by 10 for Ro Ad, because Ro Ad prefers a larger learning rate than other PEFT methods, which is also observed from Liu et al. [25] and Wen and Chaudhuri [57] where their adapters also apply multiplication instead of addition. The best hyperparameter settings for each task are listed in Table C.3. |