Binding Language Models in Symbolic Languages
Authors: Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | BINDER achieves state-of-the-art results on WIKITABLEQUESTIONS and TABFACT datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while BINDER only uses dozens of annotations as in-context exemplars without any training. |
| Researcher Affiliation | Collaboration | The University of Hong Kong Shanghai Jiao Tong University University of Washington Allen Institute for AI University of Waterloo Salesforce Research Yale University Meta AI |
| Pseudocode | No | The paper describes the BINDER pipeline and process in textual form and diagrams (e.g., Figure 1), but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/hkunlp/binder1. |
| Open Datasets | Yes | We demonstrate the effectiveness of the BINDER framework on WIKITABLEQUESTIONS (WIKITQ; Pasupat & Liang, 2015) TABFACT (Chen et al., 2019a), two structured knowledge grounding datasets that require complex reasoning on the tables. |
| Dataset Splits | No | The paper evaluates on WIKITQ and TABFACT datasets, mentioning 'development set' and 'test set' (e.g., 'WIKITQ execution accuracy on development and test sets'). While these imply standard splits from established benchmarks, explicit percentages or sample counts for training, validation, and testing are not provided within the paper. It mentions using '14 in-context exemplars' and 'a pool of 200 examples from the training set' for few-shot learning, which is different from defining the full dataset splits. |
| Hardware Specification | No | The paper states 'We use the Open AI Codex (code-davinci-002) API model in our experiments' and mentions 'randomness inherent to GPU computations', but no specific hardware components like GPU models (e.g., NVIDIA A100), CPU models, or memory specifications are detailed. |
| Software Dependencies | No | The paper mentions 'Open AI Codex (code-davinci-002)', 'Python (with the Pandas package)', 'Sentence Bert', and 'OFA', but does not provide specific version numbers for these software dependencies (e.g., Pandas 1.x, Sentence Bert 0.x). |
| Experiment Setup | Yes | We set the Codex in-context learning hyper-parameters as shown in Table 8. (Table 8 provides temperature, top_p, max_output_tokens, sampling_n, stop_tokens, num_shots for both parsing and execution phases across datasets). |