Grammar Prompting for Domain-Specific Language Generation with Large Language Models
Authors: Bailin Wang, Zi Wang, Xuezhi Wang, Yuan Cao, Rif A. Saurous, Yoon Kim
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that grammar prompting can enable LLMs to perform competitively on a diverse set of DSL generation tasks, including semantic parsing (SMCal Flow, Overnight, Geo Query), PDDL planning, and SMILES-based molecule generation. |
| Researcher Affiliation | Collaboration | Bailin Wang Zi Wang Xuezhi Wang Yuan Cao Rif A. Saurous Yoon Kim Massachusetts Institute of Technology Google Deep Mind Google Research {bailinw, yoonkim}@mit.edu, {wangzi, xuezhiw, yuancao, rif}@google.com |
| Pseudocode | Yes | Algorithm 1 Earley-based Constrained Generation |
| Open Source Code | Yes | Code and data available at: https://github.com/berlino/grammar-prompting. |
| Open Datasets | Yes | We test our approach on standard semantic parsing benchmarks involving complex DSLs: SMCal Flow [6], which features human-generated utterances about calendar management (see Figure 2); Geo Query [99] which features queries against a US Geography database; and Overnight-Blocks [81]... The data contains 32 Acrylates, 11 Chain Extenders, and 11 Isocyanates (see appendix G of Guo et al. [29]). |
| Dataset Splits | No | The paper specifies training and test examples (e.g., "Geo Query and Overnight-Blk use 32 in-context examples, and SMCal Flow uses 16 examples." and Table 6 with "Train" and "Test" counts) but does not explicitly mention validation sets or their splits for reproducibility. |
| Hardware Specification | No | The paper mentions using specific LLM APIs such as "Codex-davinci-002 [13]", "GPT-3.5", "GPT-4", and "Pa LM 2-L [7]". While these are models hosted on specific hardware, the paper does not provide details about the local hardware (e.g., GPU, CPU models, memory) used by the authors to interact with these APIs or run any local computations. |
| Software Dependencies | No | The paper mentions several software tools and models, such as "Earley parser [18]", "Sentence-BERT [59]", "Retro* model [12]", and "Pyperplan [5]", but it does not specify version numbers for any of these components. It also mentions "Python" in a figure context but without a version. |
| Experiment Setup | Yes | Table 8: Hyperparameters for sampling specialized grammars b G (top) and the molecules by in grammar prompting for molecule generation. Standard prompting uses the same hyperparameters for y. (This table specifies Temperature, Presence Penalty, Frequency Penalty). |