Controlled Text Generation via Language Model Arithmetic

Authors: Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner, Martin Vechev

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluation demonstrates that model arithmetic allows fine-grained control of generated text while outperforming state-of-the-art on the task of toxicity reduction.
Researcher Affiliation Academia Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner, Martin Vechev Department of Computer Science ETH Zurich, Switzerland {jasper.dekoninck,marc.fischer,luca.beurer-kellner,martin.vechev}@inf.ethz.ch
Pseudocode Yes For reference, we include the full speculative sampling procedure in Algorithm 1 of App. E.1.
Open Source Code Yes We release an open source easy-to-use implementation of our framework at https://github.com/eth-sri/language-model-arithmetic.
Open Datasets Yes We use a subset of the /pol/ dataset (Papasavva et al., 2020)... We evaluate model arithmetic on the task of sentiment control and closely follow the setup described in Pei et al. (2023). For this purpose, we select 1000 positive and 1000 negative reviews from the IMDB movie review dataset (Maas et al., 2011).
Dataset Splits No The paper mentions using subsets of datasets and selecting a number of messages/reviews, but does not specify explicit train/validation/test splits (e.g., percentages or counts) for their main experiments.
Hardware Specification Yes All our experiments were run on a single H100 Nvidia GPU with 80GB of VRAM.
Software Dependencies No The paper mentions using specific models (Llama-2-13b, Pythia-12b, MPT-7b) and classifiers (RoBERTa-based, Hugging Face library) but does not provide specific version numbers for any software libraries or dependencies used in the implementation.
Experiment Setup Yes We finetune a classifier for FUDGE by starting from a Ro BERTa based (Liu et al., 2019) toxicity classifier4 and finetuning it for 5 epochs with a learning rate of 1e-5... completions are stopped when they reach more than 32 tokens, contain the newline token, the end of sequence token, or the sequence Person 1: .