Composing Parameter-Efficient Modules with Arithmetic Operation

Authors: Jinghan Zhang, shiqi chen, Junteng Liu, Junxian He

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that our approach produces new and effective parameter-efficient modules that significantly outperform existing ones across all settings.1
Researcher Affiliation Academia Jinghan Zhang1 Shiqi Chen2 Junteng Liu3 Junxian He1 1The Hong Kong University of Science and Technology 2City University of Hong Kong 3Shanghai Jiao Tong University
Pseudocode No The paper describes the methods using mathematical formulas and textual descriptions, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes Code is available at https://github.com/hkust-nlp/PEM_composition.
Open Datasets Yes We work on MNLI (Williams et al., 2018), RTE (Giampiccolo et al., 2007), Co LA (Warstadt et al., 2019), SST2 (Socher et al., 2013), MRPC (Dolan & Brockett, 2005), QNLI (Rajpurkar et al., 2016), QQP (Iyer et al., 2017), and STS-B (Cer et al., 2017) datasets from the GLUE (Wang et al., 2018) task collections.
Dataset Splits Yes We then assess the individual and combined PEMs using the original validation data designed to reflect the performance on the union of the subset distributions in order to determine whether the merged PEM demonstrates improved generalization capabilities.
Hardware Specification Yes We conducted all the experiments on four 3090 GPUs, except for the negation experiment, which was carried out on four A100 GPUs.
Software Dependencies No The paper mentions various models and libraries (e.g., HuggingFace transformers library, GPT-2, RoBERTa, T5, Alpaca-LoRA) but does not provide specific version numbers for these software dependencies (e.g., PyTorch 1.x, TensorFlow 2.x, Transformers 4.x).
Experiment Setup Yes In this section, we provide additional experimental setups to supplement the main experimental section. We conducted all the experiments on four 3090 GPUs, except for the negation experiment, which was carried out on four A100 GPUs. We have optimized our hyperparameters for all the values specified on the corresponding row in Table 8 for each experiment individually.