Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimization Inspired Few-Shot Adaptation for Large Language Models
Authors: Boyan Gao, Xin Wang, Yibo Yang, David Clifton
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments In this section, we demonstrate the generalization ability of the calibrated Large Language Models on various settings. We begin by briefing the configuration of the experiments, including the architecture, datasets, and baseline models. We then dive into the efficiency of the contribution of the improvement of each proposed learning objective component. ... Table 1: Comparison between OFA and other baseline algorithms on Llama2-7B and Llama3-8B-Instruct. Mean accuracy and standard deviation across five random seeds are reported. Best results are highlighted in bold. |
| Researcher Affiliation | Academia | Boyan Gao1B Xin Wang1 Yibo Yang1,2B David A. Clifton1,3 1 University of Oxford 2 King Abdullah University of Science and Technology 3 Oxford-Suzhou Institute of Advanced Research B Corresponding authors: EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Sharpness estimation in Optimization Inspired Few-Shot Adaptation 1: Input: Input prompt: Z0, Learnable preconditioners: {Pt}t, Noise scale : ϵ, and, Transformer: {ft}t 2: Output: {tr(Pt 2L(Zt)P T t )}t 3: The first forward pass: set t = 0 4: while t < T 1 do 5: Zt+1 = f(Zt) 6: Pt L(Zt) = Zt+1 Zt 7: for i in range(N) do 8: νi N(0, I) 9: ˆZi t+1 = ft(Zt + ϵPtv) 10: Pt L(Zt + ϵPtνi) = ˆZi t+1 (Zt + ϵPtv) 11: end for 12: tr(Pt 2L(Zt)P T t ) = Eq. 4 13: t+ = 1 14: end while |
| Open Source Code | No | We will release our code for all the implementations in the paper. |
| Open Datasets | Yes | Tasks. We follow the evaluation protocol utilised in [39], and apply the same tasks to evaluate Optimization Inspired Few-Shot Adaptation, which includes sentiment analysis: SST-2 [59], emotion classification: Emoc [12], question classification: TREC [36], topic classification AGNews [80], encompassing 5-way sentiment analysis: SST-5 [59], movie review classification: MR [50], 14-way topic classification: DBPedia [33], subjectivity status categorization: Subj [49], and the hate speech detection: Hate Sp18 [16]. All the datasets are downloaded from Hugging Face without further modification. |
| Dataset Splits | Yes | Tasks. We follow the evaluation protocol utilised in [39], and apply the same tasks to evaluate Optimization Inspired Few-Shot Adaptation, which includes sentiment analysis: SST-2 [59], emotion classification: Emoc [12], question classification: TREC [36], topic classification AGNews [80], encompassing 5-way sentiment analysis: SST-5 [59], movie review classification: MR [50], 14-way topic classification: DBPedia [33], subjectivity status categorization: Subj [49], and the hate speech detection: Hate Sp18 [16]. All the datasets are downloaded from Hugging Face without further modification. ... Extra Details We follow the dataset preprocessing protocol from [39] for our experiments setting. |
| Hardware Specification | Yes | In addition, we record the practical training and inference cost of Llama3-8B-Instruct on an NVIDIA RTX A6000 for further illustration. |
| Software Dependencies | No | The paper does not provide specific software versions for its dependencies. It mentions 'Llama2-7B' and 'Llama3-8B-Instruct' as base models, but no version for frameworks like PyTorch or TensorFlow, or Python itself. |
| Experiment Setup | Yes | F Hyperparameter Pool We conduct the grid search for fair comparison over all the models, including all the baseline models and ours. The hyperparameter pool for the model tuning is give in Table 8. Table 8: Hyperparameter Pool for the Lo RA model tuning. Hyperparameter Values λ1 0.1, 0.001, 0.0001, 0.00001, 0.000001 λ2 0.1, 0.001, 0.0001, 0.00001, 0.000001 Optimizer Adam W Learning rate 0.001, 0.0001, 0.00001, Weight decay 0.001, 0.0001, 0.00001, 0.000001 Training epoch 20, 50, 60, 80, 100 |