BatchPrompt: Accomplish more with less
Authors: Jianzhe Lin, Maurice Diesendruck, Liang Du, Robin Abraham
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive experimental evaluation demonstrates that BPE + SEAS can boost the performance of Batch Prompt by a striking margin on a range of popular NLP tasks, including question answering (Boolq), textual entailment (RTE), and duplicate questions identification (QQP). |
| Researcher Affiliation | Industry | Jianzhe Lin, Maurice Diesendruck, Liang Du, Robin Abraham Microsoft |
| Pseudocode | Yes | The method is described using the pseudo-code in Alg. 1. |
| Open Source Code | Yes | Code: github.com/microsoft/Batch Prompt |
| Open Datasets | Yes | Boolq: Boolean Questions (Boolq) is a question-answering dataset for yes/no questions containing 15942 examples (9427 for training, 3270 for validation, 3245 for testing). |
| Dataset Splits | Yes | Boolq: Boolean Questions (Boolq) is a question-answering dataset for yes/no questions containing 15942 examples (9427 for training, 3270 for validation, 3245 for testing). |
| Hardware Specification | No | The paper mentions using "gpt-3.5-turbo and GPT-4" for evaluation but does not provide specific hardware details such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions using "gpt-3.5-turbo and GPT-4" but does not specify their version numbers or any other software dependencies (e.g., Python, libraries) with version numbers. |
| Experiment Setup | Yes | We use 2, 4, and 4 few shot examples for RTE, QQP, Bool Q respectively... Temperature is always set to 0 for consistent results... The batch sizes we use for RTE, QQP, Bool Q are 16/32/64/160... for GPT-4... and 16/32 for gpt-3.5-turbo... The number of voting rounds we choose is 1, 3, 5, 7, and 9. |