reproducibilityindex.ai

BatchPrompt: Accomplish more with less

Authors: Jianzhe Lin, Maurice Diesendruck, Liang Du, Robin Abraham

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our comprehensive experimental evaluation demonstrates that BPE + SEAS can boost the performance of Batch Prompt by a striking margin on a range of popular NLP tasks, including question answering (Boolq), textual entailment (RTE), and duplicate questions identification (QQP).
Researcher Affiliation	Industry	Jianzhe Lin, Maurice Diesendruck, Liang Du, Robin Abraham Microsoft
Pseudocode	Yes	The method is described using the pseudo-code in Alg. 1.
Open Source Code	Yes	Code: github.com/microsoft/Batch Prompt
Open Datasets	Yes	Boolq: Boolean Questions (Boolq) is a question-answering dataset for yes/no questions containing 15942 examples (9427 for training, 3270 for validation, 3245 for testing).
Dataset Splits	Yes	Boolq: Boolean Questions (Boolq) is a question-answering dataset for yes/no questions containing 15942 examples (9427 for training, 3270 for validation, 3245 for testing).
Hardware Specification	No	The paper mentions using "gpt-3.5-turbo and GPT-4" for evaluation but does not provide specific hardware details such as GPU models, CPU specifications, or memory.
Software Dependencies	No	The paper mentions using "gpt-3.5-turbo and GPT-4" but does not specify their version numbers or any other software dependencies (e.g., Python, libraries) with version numbers.
Experiment Setup	Yes	We use 2, 4, and 4 few shot examples for RTE, QQP, Bool Q respectively... Temperature is always set to 0 for consistent results... The batch sizes we use for RTE, QQP, Bool Q are 16/32/64/160... for GPT-4... and 16/32 for gpt-3.5-turbo... The number of voting rounds we choose is 1, 3, 5, 7, and 9.