Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Authors: Rabeeh Karimi Mahabadi, James Henderson, Sebastian Ruder
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments |
| Researcher Affiliation | Collaboration | Rabeeh Karimi Mahabadi EPFL University, Idiap Research Institute rabeeh.karimi@idiap.ch James Henderson Idiap Research Institute james.henderson@idiap.ch Sebastian Ruder Deep Mind ruder@google.com |
| Pseudocode | No | The paper describes the computational process and formulations using mathematical equations and descriptive text, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/rabeehk/compacter. |
| Open Datasets | Yes | Following Raffel et al. [3], we evaluate the performance of the methods on the GLUE [18] and SUPERGLUE [19] benchmarks. |
| Dataset Splits | Yes | As the original test sets are not publicly available, we follow Zhang et al. [27] and split off 1k samples from the training set that we use for validation, while we use the original validation data as the test set. For datasets with fewer than 10k samples (RTE, MRPC, STS-B, Co LA, COPA, Wi C, CB, Bool Q, Multi RC), we divide the original validation set in half, using one half for validation and the other for testing. |
| Hardware Specification | No | For each method, we select the largest batch size that fits a fixed budget of the GPU memory (24 GB). This text mentions GPU memory capacity but does not specify the model of the GPU, CPU, or other detailed hardware components used for the experiments. |
| Software Dependencies | No | We use its Hugging Face Py Torch implementation [28]." and "For the PHM layers, we use the Py Torch implementation of Le et al. [29]." The paper mentions the software used (Hugging Face PyTorch, PyTorch) but does not provide specific version numbers for these dependencies to ensure reproducibility. |
| Experiment Setup | Yes | We fine-tune all methods for 3 epochs on large datasets and 20 epochs for low-resource datasets of GLUE (MRPC, Co LA, STS-B, RTE, Bool Q, CB, COPA, Wi C) to allow the models to converge [27]. For all adapter-based methods, we experiment with adapters of bottleneck size of {96,48,24}. We save a checkpoint every epoch for all models and report the results for the hyper-parameters performing the best on the validation set for each task. For our methods, we experiment with n={4,8,12} and report the model performing the best. |