Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
Authors: Tom A. Lamb, Adam Davies, Alasdair Paren, Philip Torr, Francesco Pinto
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we empirically validate the effectiveness of FIT across a range of popular LLMs of varying sizes and on different NLP datasets, including classification and multi-choice question-answering (MCQA) tasks. |
| Researcher Affiliation | Academia | 1University of Oxford, Oxford, UK 2University of Illinois Urbana-Champaign, Urbana, IL, USA 3University of Chicago, Chicago, IL, USA. Correspondence to: Tom A. Lamb <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Algorithm for Focus Instruction Tuning (FIT) Training Procedure to Optimise Equation (3). |
| Open Source Code | Yes | 1Our project page, including links to codebase and datasets, is available at: https://tomalamb.github.io/focus-instruction-tuning/. |
| Open Datasets | Yes | SMNLI dataset, a sub-sampled version of the MNLI dataset (Williams et al., 2018) ... SS dataset, a synthetic sentiment analysis dataset derived from SST-5 (Socher et al., 2013b) ... BBQ dataset (Parrish et al., 2022) |
| Dataset Splits | Yes | We obtain this set by splitting our training sets in a 90/10% ratio for training and validation splits respectively. ... Table 3. Dataset Sizes. Dataset split sizes for SS, SMNLI and BBQ datasets. Training 5296 7200 16700 Validation 1324 1800 1590 Test 1818 900 2352 |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU models, or cloud instance types) are provided in the paper for running experiments. |
| Software Dependencies | No | The models are fine-tuned using parameter-efficient SFT with Lo RA (Hu et al., 2021), leveraging Hugging Face s SFTTrainer (Wolf et al., 2020). The objective in Equation (3) can be optimised through sampling using stochastic gradient descent (SGD) with popular optimisers such as Adam W (Loshchilov & Hutter, 2019). |
| Experiment Setup | Yes | We use Lo RA (Hu et al., 2021) for parameter-efficient fine-tuning. We target the query and value projection matrices within each LLM and use Lo RA r = 16 and α = 32 across models. We implement early stopping on a held-out validation set based on the cross-entropy loss over focus labels yfocus corresponding to randomly sampled focus instructions... We use a patience of 4 validation evaluation steps, which occur after a fixed number of steps. During training, we define p(Ifocus) by placing a small probability (in our experiments, 0.05) on the empty focus instruction . We then uniformly distribute the remaining probability mass over the non-empty focus instructions. We generate responses from models using constrained beam-decoding (Anderson et al., 2017) with 4 beams. We limit the maximum number of newly generated tokens to be 5... |