Prompt Sketching for Large Language Models

Authors: Luca Beurer-Kellner, Mark Niklas Mueller, Marc Fischer, Martin Vechev

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that in a zero-shot setting, prompt sketching outperforms existing, sequential prompting schemes such as direct asking or chain-of-thought on 7 out of 8 LLM benchmarking tasks, including state tracking, arithmetic reasoning, and general question answering.
Researcher Affiliation Academia Luca Beurer-Kellner 1 Mark Niklas Mueller 1 Marc Fischer 1 Martin Vechev 1 1Department of Computer Science, ETH Zürich, Switzerland. Correspondence to: Luca Beurer-Kellner <luca.beurerkellner@inf.ethz.ch>.
Pseudocode Yes See App. A, for a pseudo-code implementation of VAR. A pseudo-code implementation of BEAMVAR can be found in App. A.
Open Source Code Yes To facilitate future use, we release a number of generic, yet effective sketches applicable to many tasks, and an open source library called dclib, powering our sketch-aware decoders as part of https://github.com/eth-sri/lmql.
Open Datasets Yes AQuA (Ling et al., 2017), Strategy QA (Geva et al., 2021), GSM8K (Cobbe et al., 2021) - all are standard datasets.
Dataset Splits No The paper evaluates on samples from existing benchmarks but does not specify custom training/validation/test splits or reference how the datasets themselves are partitioned for those stages of model development.
Hardware Specification Yes For Llama-2, on the other hand, we run all of our experiments on 1000 samples per task (or the full datasets), using a single NVIDIA H100 GPU with 80GB memory.
Software Dependencies No The paper mentions LMQL and dclib library but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For BEAM, VAR, and BEAMVAR we use a beam width of n = 2 and rely on length normalized scoring in line with previous work (Wu et al., 2016), using β = 0 and α = 0.7.