Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

Authors: Luca Beurer-Kellner, Marc Fischer, Martin Vechev

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experimental Evaluation We evaluate DOMINO in terms of downstream task accuracy, compare its performance to multiple baselines and ablate key parameters such as k.
Researcher Affiliation Academia Luca Beurer-Kellner 1 Marc Fischer 1 Martin Vechev 1 1Department of Computer Science, ETH Zürich, Switzerland.
Pseudocode Yes Algorithm 1 Constrained Decoding Input: Checker C, LLM f, Tokenized Prompt x Output: Completion o adhering to C
Open Source Code Yes We release DOMINO as open source on Git Hub.
Open Datasets Yes Datasets We assess downstream accuracy of different constraining methods with the GSM8K (Cobbe et al., 2021) benchmark for math reasoning and CoNLL-2003 (Sang & De Meulder, 2003) for named-entity recognition (subset of 400 test samples).
Dataset Splits Yes For GSM8K and CoNLL-2003, we prompt and constrain the models to generate a response in a given JSON format... Our prompts consist of 5 few-shot demonstrations from the training split, for which we manually construct the corresponding JSON response. ... CoNLL-2003 for named-entity recognition (subset of 400 test samples).
Hardware Specification Yes As inference backends, we rely on both, transformers (Wolf et al., 2019) and llama.cpp (Gerganov & et. al.) on NVIDIA A100 40GB or H100 80GB GPUs.
Software Dependencies No The paper mentions software like 'transformers' and 'llama.cpp' but does not specify their version numbers.
Experiment Setup Yes Setup We run 100 repetitions per configuration. In each, we sample one of 5 different prompts per workload, and sample output of up to 128 tokens from the model, using a temperature value of 1.0.