reproducibilityindex.ai

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Authors: Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, Bernhard Schölkopf

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.
Researcher Affiliation	Collaboration	1Max Planck Institute for Intelligent Systems T ubingen 2University of Cambridge 3ETH Z urich 4University of T ubingen 5Mila, Universit e de Montr eal 6The Alan Turing Institute
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	boft.wyliu.com
Open Datasets	Yes	To evaluate the performance of BOFT on LLM adaptation, we first finetune a pretrained De BERTa V3-base model [25] on the GLUE benchmark [87]...We use Alpaca [80] as our finetuning dataset and evaluate both zero-shot and fewshot performance on the MMLU dataset [27]...using two challenging benchmarks: GSM8K [11] and MATH [27]...We evaluate the finetuning performance of BOFT on the VTAB1K benchmark [94]...on a high-quality segmentation dataset, HQSeg-44K [34]...We finetune the pretrained Stable Diffusion [73]
Dataset Splits	Yes	Results are presented in Table 1. # Param in the table denotes the total number of effective trainable parameters for each method. We note that OFT [67] with the block size 16 is BOFT(1,16).
Hardware Specification	Yes	All runs can be trained on a single NVIDIA A100-SXM4-80GB GPU.
Software Dependencies	No	The paper mentions software like "Hugging Face s Diffusers [85]" and "Parameter-Efficient Fine-Tuning (PEFT) [55]" but does not specify their version numbers.
Experiment Setup	Yes	For our experiments on the GLUE benchmark [87], we follow the setting of [97] and only tune the learning rate, the multiplicative dropout rate, and the number of training epochs. ... a total number of 30 training epochs, a fixed training batch size of 64, an Adam W optimizer, and a cosine learning rate scheduler with a warmup ratio of 0.1.