Sparse Structure Search for Delta Tuning

Authors: Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan Liu, Maosong Sun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that S3Delta surpasses manual and random structures with less trainable parameters.
Researcher Affiliation Collaboration 1Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University, Beijing, China [...] 3Noah s Ark Lab, Huawei
Pseudocode Yes Algorithm 1 Algorithm of S3Delta
Open Source Code Yes Our codes are publicly available at https://github.com/thunlp/S3Delta.
Open Datasets Yes We apply S3Delta to multitask benchmarks GLUE [38] and Super GLUE [37] following previous works. All datasets are downloaded from the Hugging Face Datasets [19].
Dataset Splits Yes Since the test splits of these datasets are held officially and invisible to the researchers, we conduct random splits from either train set or validation set to make the new train, validation, and test splits, which is critical to ensure fair evaluations according to Chen et al. [4].
Hardware Specification Yes All the experiments are conducted on 8 NVIDIA GeForce RTX 3090 GPUs.
Software Dependencies No The paper mentions using Hugging Face Datasets but does not provide specific version numbers for any software libraries or dependencies, such as PyTorch, Python, or other relevant packages.
Experiment Setup Yes We fix the random seed as 42 for all experiments unless explicitly specified. We train our model for 200 epochs for structure search and 100 epochs for evaluation. The learning rate for fine-tuning is 1e-4, and for all delta tuning methods, it is 1e-3. We use AdamW as our optimizer. The batch size is 32.