Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
Authors: Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, Jia Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our Fourier FT method shows comparable or better performance with fewer parameters than Lo RA on various tasks, including natural language understanding, natural language generation, instruction tuning, and image classification. For example, when performing instruction tuning on the LLa MA2-7B model, Fourier FT surpasses Lo RA with only 0.064M trainable parameters, compared to Lo RA s 33.5M. Our code is released at https: //github.com/Chaos96/fourierft. |
| Researcher Affiliation | Collaboration | 1Hong Kong University of Science and Technology (Guangzhou) 2Hong Kong University of Science and Technology 3Sun Yat-sen University 4International Digital Economy Academy 5AI Lab, Tencent. |
| Pseudocode | Yes | Algorithm 1 Py Torch-style pseudocode for Fourier FT. |
| Open Source Code | Yes | Our code is released at https: //github.com/Chaos96/fourierft. |
| Open Datasets | Yes | GLUE benchmark (General Language Understanding Evaluation (Wang et al., 2018)), E2E natural language generation (NLG) task (Novikova et al., 2017)), Alpaca dataset (Taori et al., 2023), Image Net-21K dataset (Ridnik et al., 2021), Oxford Pets (Parkhi et al., 2012), CIFAR10 (Krizhevsky, 2009), DTD (Cimpoi et al., 2014), Euro SAT (Helber et al., 2019), RESISC45 (Cheng et al., 2017), Stanford Cars (Krause et al., 2013), FGVC (Maji et al., 2013), CIFAR100 (Krizhevsky, 2009). |
| Dataset Splits | Yes | Table 7. Task descriptions and dataset statistics of the GLUE benchmark. ... # Train # Val # Test ... |
| Hardware Specification | No | The paper mentions 'training on a single GPU' in Section 4.3 but does not specify any particular GPU model (e.g., NVIDIA A100, RTX 2080 Ti), CPU model, or other specific hardware details used for the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch-style pseudocode' in Algorithm 1, but it does not list any specific software dependencies with their version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1). |
| Experiment Setup | Yes | Table 9. Hyperparameter setup of Fourier FT for the GLUE benchmark. ... Optimizer Adam W, LR Schedule Linear, Learning Rate (Fourier FT), Learning Rate (Head), Max Seq. Len, Scaling value, Batch Size ...; Table 10. Hyperparameter setup of Fourier FT on the E2E benchmark. ... Optimizer Adam W, Learning Rate (Fourier FT), Learning Rate (Head), Batch Size, Weight Decay, n, Scaling value α, Epochs, Label Smooth, LR Schedule Linear ...; Table 11. Hyperparameter setup for instruction-tuning of Lo RA and Fourier FT. ... Optimizer Adam W, Warmup Ratio, Batch Size, Accumulation Steps, Epochs, n, Scaling Value α, LR Schedule Linear, Learning Rate ...; Table 12. Hyperparameter setup for image classification of Fourier FT. ... Epochs, Optimizer Adam W, LR Schedule Linear, n, α, Learning Rate (Fourier FT), Learning Rate (Head), Weight Decay |