reproducibilityindex.ai

Binding Language Models in Symbolic Languages

Authors: Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	BINDER achieves state-of-the-art results on WIKITABLEQUESTIONS and TABFACT datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while BINDER only uses dozens of annotations as in-context exemplars without any training.
Researcher Affiliation	Collaboration	The University of Hong Kong Shanghai Jiao Tong University University of Washington Allen Institute for AI University of Waterloo Salesforce Research Yale University Meta AI
Pseudocode	No	The paper describes the BINDER pipeline and process in textual form and diagrams (e.g., Figure 1), but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/hkunlp/binder1.
Open Datasets	Yes	We demonstrate the effectiveness of the BINDER framework on WIKITABLEQUESTIONS (WIKITQ; Pasupat & Liang, 2015) TABFACT (Chen et al., 2019a), two structured knowledge grounding datasets that require complex reasoning on the tables.
Dataset Splits	No	The paper evaluates on WIKITQ and TABFACT datasets, mentioning 'development set' and 'test set' (e.g., 'WIKITQ execution accuracy on development and test sets'). While these imply standard splits from established benchmarks, explicit percentages or sample counts for training, validation, and testing are not provided within the paper. It mentions using '14 in-context exemplars' and 'a pool of 200 examples from the training set' for few-shot learning, which is different from defining the full dataset splits.
Hardware Specification	No	The paper states 'We use the Open AI Codex (code-davinci-002) API model in our experiments' and mentions 'randomness inherent to GPU computations', but no specific hardware components like GPU models (e.g., NVIDIA A100), CPU models, or memory specifications are detailed.
Software Dependencies	No	The paper mentions 'Open AI Codex (code-davinci-002)', 'Python (with the Pandas package)', 'Sentence Bert', and 'OFA', but does not provide specific version numbers for these software dependencies (e.g., Pandas 1.x, Sentence Bert 0.x).
Experiment Setup	Yes	We set the Codex in-context learning hyper-parameters as shown in Table 8. (Table 8 provides temperature, top_p, max_output_tokens, sampling_n, stop_tokens, num_shots for both parsing and execution phases across datasets).