BatchPrompt: Accomplish more with less

Authors: Jianzhe Lin, Maurice Diesendruck, Liang Du, Robin Abraham

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive experimental evaluation demonstrates that BPE + SEAS can boost the performance of Batch Prompt by a striking margin on a range of popular NLP tasks, including question answering (Boolq), textual entailment (RTE), and duplicate questions identification (QQP).
Researcher Affiliation Industry Jianzhe Lin, Maurice Diesendruck, Liang Du, Robin Abraham Microsoft
Pseudocode Yes The method is described using the pseudo-code in Alg. 1.
Open Source Code Yes Code: github.com/microsoft/Batch Prompt
Open Datasets Yes Boolq: Boolean Questions (Boolq) is a question-answering dataset for yes/no questions containing 15942 examples (9427 for training, 3270 for validation, 3245 for testing).
Dataset Splits Yes Boolq: Boolean Questions (Boolq) is a question-answering dataset for yes/no questions containing 15942 examples (9427 for training, 3270 for validation, 3245 for testing).
Hardware Specification No The paper mentions using "gpt-3.5-turbo and GPT-4" for evaluation but does not provide specific hardware details such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions using "gpt-3.5-turbo and GPT-4" but does not specify their version numbers or any other software dependencies (e.g., Python, libraries) with version numbers.
Experiment Setup Yes We use 2, 4, and 4 few shot examples for RTE, QQP, Bool Q respectively... Temperature is always set to 0 for consistent results... The batch sizes we use for RTE, QQP, Bool Q are 16/32/64/160... for GPT-4... and 16/32 for gpt-3.5-turbo... The number of voting rounds we choose is 1, 3, 5, 7, and 9.