Controlled Text Generation via Language Model Arithmetic
Authors: Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner, Martin Vechev
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation demonstrates that model arithmetic allows fine-grained control of generated text while outperforming state-of-the-art on the task of toxicity reduction. |
| Researcher Affiliation | Academia | Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner, Martin Vechev Department of Computer Science ETH Zurich, Switzerland {jasper.dekoninck,marc.fischer,luca.beurer-kellner,martin.vechev}@inf.ethz.ch |
| Pseudocode | Yes | For reference, we include the full speculative sampling procedure in Algorithm 1 of App. E.1. |
| Open Source Code | Yes | We release an open source easy-to-use implementation of our framework at https://github.com/eth-sri/language-model-arithmetic. |
| Open Datasets | Yes | We use a subset of the /pol/ dataset (Papasavva et al., 2020)... We evaluate model arithmetic on the task of sentiment control and closely follow the setup described in Pei et al. (2023). For this purpose, we select 1000 positive and 1000 negative reviews from the IMDB movie review dataset (Maas et al., 2011). |
| Dataset Splits | No | The paper mentions using subsets of datasets and selecting a number of messages/reviews, but does not specify explicit train/validation/test splits (e.g., percentages or counts) for their main experiments. |
| Hardware Specification | Yes | All our experiments were run on a single H100 Nvidia GPU with 80GB of VRAM. |
| Software Dependencies | No | The paper mentions using specific models (Llama-2-13b, Pythia-12b, MPT-7b) and classifiers (RoBERTa-based, Hugging Face library) but does not provide specific version numbers for any software libraries or dependencies used in the implementation. |
| Experiment Setup | Yes | We finetune a classifier for FUDGE by starting from a Ro BERTa based (Liu et al., 2019) toxicity classifier4 and finetuning it for 5 epochs with a learning rate of 1e-5... completions are stopped when they reach more than 32 tokens, contain the newline token, the end of sequence token, or the sequence Person 1: . |