Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimization Inspired Few-Shot Adaptation for Large Language Models

Authors: Boyan Gao, Xin Wang, Yibo Yang, David Clifton

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments In this section, we demonstrate the generalization ability of the calibrated Large Language Models on various settings. We begin by briefing the configuration of the experiments, including the architecture, datasets, and baseline models. We then dive into the efficiency of the contribution of the improvement of each proposed learning objective component. ... Table 1: Comparison between OFA and other baseline algorithms on Llama2-7B and Llama3-8B-Instruct. Mean accuracy and standard deviation across five random seeds are reported. Best results are highlighted in bold.
Researcher Affiliation	Academia	Boyan Gao1B Xin Wang1 Yibo Yang1,2B David A. Clifton1,3 1 University of Oxford 2 King Abdullah University of Science and Technology 3 Oxford-Suzhou Institute of Advanced Research B Corresponding authors: EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Sharpness estimation in Optimization Inspired Few-Shot Adaptation 1: Input: Input prompt: Z0, Learnable preconditioners: {Pt}t, Noise scale : ϵ, and, Transformer: {ft}t 2: Output: {tr(Pt 2L(Zt)P T t )}t 3: The first forward pass: set t = 0 4: while t < T 1 do 5: Zt+1 = f(Zt) 6: Pt L(Zt) = Zt+1 Zt 7: for i in range(N) do 8: νi N(0, I) 9: ˆZi t+1 = ft(Zt + ϵPtv) 10: Pt L(Zt + ϵPtνi) = ˆZi t+1 (Zt + ϵPtv) 11: end for 12: tr(Pt 2L(Zt)P T t ) = Eq. 4 13: t+ = 1 14: end while
Open Source Code	No	We will release our code for all the implementations in the paper.
Open Datasets	Yes	Tasks. We follow the evaluation protocol utilised in [39], and apply the same tasks to evaluate Optimization Inspired Few-Shot Adaptation, which includes sentiment analysis: SST-2 [59], emotion classification: Emoc [12], question classification: TREC [36], topic classification AGNews [80], encompassing 5-way sentiment analysis: SST-5 [59], movie review classification: MR [50], 14-way topic classification: DBPedia [33], subjectivity status categorization: Subj [49], and the hate speech detection: Hate Sp18 [16]. All the datasets are downloaded from Hugging Face without further modification.
Dataset Splits	Yes	Tasks. We follow the evaluation protocol utilised in [39], and apply the same tasks to evaluate Optimization Inspired Few-Shot Adaptation, which includes sentiment analysis: SST-2 [59], emotion classification: Emoc [12], question classification: TREC [36], topic classification AGNews [80], encompassing 5-way sentiment analysis: SST-5 [59], movie review classification: MR [50], 14-way topic classification: DBPedia [33], subjectivity status categorization: Subj [49], and the hate speech detection: Hate Sp18 [16]. All the datasets are downloaded from Hugging Face without further modification. ... Extra Details We follow the dataset preprocessing protocol from [39] for our experiments setting.
Hardware Specification	Yes	In addition, we record the practical training and inference cost of Llama3-8B-Instruct on an NVIDIA RTX A6000 for further illustration.
Software Dependencies	No	The paper does not provide specific software versions for its dependencies. It mentions 'Llama2-7B' and 'Llama3-8B-Instruct' as base models, but no version for frameworks like PyTorch or TensorFlow, or Python itself.
Experiment Setup	Yes	F Hyperparameter Pool We conduct the grid search for fair comparison over all the models, including all the baseline models and ours. The hyperparameter pool for the model tuning is give in Table 8. Table 8: Hyperparameter Pool for the Lo RA model tuning. Hyperparameter Values λ1 0.1, 0.001, 0.0001, 0.00001, 0.000001 λ2 0.1, 0.001, 0.0001, 0.00001, 0.000001 Optimizer Adam W Learning rate 0.001, 0.0001, 0.00001, Weight decay 0.001, 0.0001, 0.00001, 0.000001 Training epoch 20, 50, 60, 80, 100