Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
Authors: Luca Beurer-Kellner, Marc Fischer, Martin Vechev
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experimental Evaluation We evaluate DOMINO in terms of downstream task accuracy, compare its performance to multiple baselines and ablate key parameters such as k. |
| Researcher Affiliation | Academia | Luca Beurer-Kellner 1 Marc Fischer 1 Martin Vechev 1 1Department of Computer Science, ETH Zürich, Switzerland. |
| Pseudocode | Yes | Algorithm 1 Constrained Decoding Input: Checker C, LLM f, Tokenized Prompt x Output: Completion o adhering to C |
| Open Source Code | Yes | We release DOMINO as open source on Git Hub. |
| Open Datasets | Yes | Datasets We assess downstream accuracy of different constraining methods with the GSM8K (Cobbe et al., 2021) benchmark for math reasoning and CoNLL-2003 (Sang & De Meulder, 2003) for named-entity recognition (subset of 400 test samples). |
| Dataset Splits | Yes | For GSM8K and CoNLL-2003, we prompt and constrain the models to generate a response in a given JSON format... Our prompts consist of 5 few-shot demonstrations from the training split, for which we manually construct the corresponding JSON response. ... CoNLL-2003 for named-entity recognition (subset of 400 test samples). |
| Hardware Specification | Yes | As inference backends, we rely on both, transformers (Wolf et al., 2019) and llama.cpp (Gerganov & et. al.) on NVIDIA A100 40GB or H100 80GB GPUs. |
| Software Dependencies | No | The paper mentions software like 'transformers' and 'llama.cpp' but does not specify their version numbers. |
| Experiment Setup | Yes | Setup We run 100 repetitions per configuration. In each, we sample one of 5 different prompts per workload, and sample output of up to 128 tokens from the model, using a temperature value of 1.0. |