Parameter-Efficient Fine-Tuning with Controls
Authors: Chi Zhang, Cheng Jingpu, Yanyu Xu, Qianxiao Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical findings substantiate that, without introducing any additional parameters, this approach surpasses the Lo RA algorithms across all assessed datasets and rank configurations. 6. Experiment In this part, we evaluate the effectiveness of the nonlinear controllers by conducting a series of experiments on vision datasets. Table 1 reports the performance of all algorithms, with the same pre-trained Vi T backbone. |
| Researcher Affiliation | Academia | *Equal contribution 1Department of Maths, National University of Singapore, Singapore 2The Joint SDU-NTU Research Center of Artificial Intelligence, Shandong University, China. Correspondence to: Qianxiao Li <Qianxiao@nus.edu.sg>. |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any statement about making its source code available, nor does it include a link to a code repository. |
| Open Datasets | Yes | We commence with a numerical verification of the condition outlined in Theorem 4.2 through a small-size example. In particular, we consider a scenario wherein the original model is a randomly initialized 10-layer Vi T model. We now proceed to validate our approach on various vision benchmarks, including CIFAR100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011) and Food-101 (Bossard et al., 2014). |
| Dataset Splits | No | The paper mentions using standard datasets like CIFAR100, SVHN, and Food-101 and mirroring experimental settings from Adapt Former (Chen et al., 2022). However, it does not explicitly state the train/validation/test dataset splits (e.g., percentages, sample counts, or specific split methodology) within the paper itself that would be needed for reproduction. |
| Hardware Specification | Yes | All experiments are conducted on the Nvidia-3090. |
| Software Dependencies | No | The paper mentions using 'Stochastic Gradient Descent (SGD) algorithm with a momentum of 0.9' but does not specify version numbers for any software components, libraries, or programming languages used (e.g., Python, PyTorch, etc.). |
| Experiment Setup | Yes | The Stochastic Gradient Descent (SGD) algorithm with a momentum of 0.9 is employed for optimizing the controls during the training process. Its batch-size is set to 128 and the learning rate is set to 0.05. The down-projection layer weights in the controls are initialized using Kaiming Norm (He et al., 2015), while the up-projection layer weights are set to 0. Analogously, all biases in the controls are initialized to 0. |