Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Authors: Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To illustrate these advantages of DECOMP, we empirically evaluate it against prior work on eight challenging datasets using GPT3 models |
| Researcher Affiliation | Collaboration | Allen Institute for AI Stony Brook University University of Edinburgh tushark@allenai.org, hjtrivedi@cs.stonybrook.edu, matthewf@allenai.org, yao.fu@ed.ac.uk, kyler@allenai.org, peterc@allenai.org, ashishs@allenai.org |
| Pseudocode | Yes | Algorithm 1 A recursive reversal strategy that splits the sequence in half, reverses each half, and concatenates them. Runs in O(log n) calls to the LM where n is the number of items in the sequence. procedure SPLITREVERSE(x) |
| Open Source Code | Yes | Datasets, Code and Prompts available at https://github.com/allenai/DecomP. |
| Open Datasets | Yes | We use Hotpot QA in the fullwiki setting where it comes with the associated Wikipedia corpus for open-domain QA. 2Wiki Multihop QA and Mu Si Que, however, are originally reading comprehension datasets. ... To turn these datasets into open-domain QA datasets, we create a corpora for each dataset by combining all the paragraphs in the train, dev and test questions. |
| Dataset Splits | Yes | We manually annotate Co Ts and decompositions for 20 training set questions, and sample 3 prompts of 15 questions each for all approaches. The detailed prompts are given in the Appendix G. We evaluate on 300 held-out dev questions in each dataset. |
| Hardware Specification | No | The paper specifies the LLM models used (e.g., 'text-davinci-002 Instruct GPT3 model', 'Codex (code-davinci-002) model', 'Flan-T5-Large', 'Flan-T5-XL', 'Flan-T5-XXL') but does not provide specific hardware details (like GPU models, CPU types, or memory) on which these models or the experiments were run. |
| Software Dependencies | No | The paper refers to specific LLM models (e.g., GPT3 text-davinci-002, Codex code-davinci-002, Flan-T5 family) but does not provide details on specific software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow versions, or other dependencies) required for replication. |
| Experiment Setup | Yes | For No Decomp-Ctxt, we search K {6, 8, 10} for GPT3 models and K 2, 4, 6, 8 for Flan-T5-* models. For Decomp-Ctxt, we search K {2, 4, 6} for GPT3 and Flan-T5-* models. |