Composing Parameter-Efficient Modules with Arithmetic Operation
Authors: Jinghan Zhang, shiqi chen, Junteng Liu, Junxian He
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that our approach produces new and effective parameter-efficient modules that significantly outperform existing ones across all settings.1 |
| Researcher Affiliation | Academia | Jinghan Zhang1 Shiqi Chen2 Junteng Liu3 Junxian He1 1The Hong Kong University of Science and Technology 2City University of Hong Kong 3Shanghai Jiao Tong University |
| Pseudocode | No | The paper describes the methods using mathematical formulas and textual descriptions, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Code is available at https://github.com/hkust-nlp/PEM_composition. |
| Open Datasets | Yes | We work on MNLI (Williams et al., 2018), RTE (Giampiccolo et al., 2007), Co LA (Warstadt et al., 2019), SST2 (Socher et al., 2013), MRPC (Dolan & Brockett, 2005), QNLI (Rajpurkar et al., 2016), QQP (Iyer et al., 2017), and STS-B (Cer et al., 2017) datasets from the GLUE (Wang et al., 2018) task collections. |
| Dataset Splits | Yes | We then assess the individual and combined PEMs using the original validation data designed to reflect the performance on the union of the subset distributions in order to determine whether the merged PEM demonstrates improved generalization capabilities. |
| Hardware Specification | Yes | We conducted all the experiments on four 3090 GPUs, except for the negation experiment, which was carried out on four A100 GPUs. |
| Software Dependencies | No | The paper mentions various models and libraries (e.g., HuggingFace transformers library, GPT-2, RoBERTa, T5, Alpaca-LoRA) but does not provide specific version numbers for these software dependencies (e.g., PyTorch 1.x, TensorFlow 2.x, Transformers 4.x). |
| Experiment Setup | Yes | In this section, we provide additional experimental setups to supplement the main experimental section. We conducted all the experiments on four 3090 GPUs, except for the negation experiment, which was carried out on four A100 GPUs. We have optimized our hyperparameters for all the values specified on the corresponding row in Table 8 for each experiment individually. |