reproducibilityindex.ai

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

Authors: Luca Beurer-Kellner, Marc Fischer, Martin Vechev

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experimental Evaluation We evaluate DOMINO in terms of downstream task accuracy, compare its performance to multiple baselines and ablate key parameters such as k.
Researcher Affiliation	Academia	Luca Beurer-Kellner 1 Marc Fischer 1 Martin Vechev 1 1Department of Computer Science, ETH Zürich, Switzerland.
Pseudocode	Yes	Algorithm 1 Constrained Decoding Input: Checker C, LLM f, Tokenized Prompt x Output: Completion o adhering to C
Open Source Code	Yes	We release DOMINO as open source on Git Hub.
Open Datasets	Yes	Datasets We assess downstream accuracy of different constraining methods with the GSM8K (Cobbe et al., 2021) benchmark for math reasoning and CoNLL-2003 (Sang & De Meulder, 2003) for named-entity recognition (subset of 400 test samples).
Dataset Splits	Yes	For GSM8K and CoNLL-2003, we prompt and constrain the models to generate a response in a given JSON format... Our prompts consist of 5 few-shot demonstrations from the training split, for which we manually construct the corresponding JSON response. ... CoNLL-2003 for named-entity recognition (subset of 400 test samples).
Hardware Specification	Yes	As inference backends, we rely on both, transformers (Wolf et al., 2019) and llama.cpp (Gerganov & et. al.) on NVIDIA A100 40GB or H100 80GB GPUs.
Software Dependencies	No	The paper mentions software like 'transformers' and 'llama.cpp' but does not specify their version numbers.
Experiment Setup	Yes	Setup We run 100 repetitions per configuration. In each, we sample one of 5 different prompts per workload, and sample output of up to 128 tokens from the model, using a temperature value of 1.0.