Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Authors: Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To illustrate these advantages of DECOMP, we empirically evaluate it against prior work on eight challenging datasets using GPT3 models
Researcher Affiliation Collaboration Allen Institute for AI Stony Brook University University of Edinburgh tushark@allenai.org, hjtrivedi@cs.stonybrook.edu, matthewf@allenai.org, yao.fu@ed.ac.uk, kyler@allenai.org, peterc@allenai.org, ashishs@allenai.org
Pseudocode Yes Algorithm 1 A recursive reversal strategy that splits the sequence in half, reverses each half, and concatenates them. Runs in O(log n) calls to the LM where n is the number of items in the sequence. procedure SPLITREVERSE(x)
Open Source Code Yes Datasets, Code and Prompts available at https://github.com/allenai/DecomP.
Open Datasets Yes We use Hotpot QA in the fullwiki setting where it comes with the associated Wikipedia corpus for open-domain QA. 2Wiki Multihop QA and Mu Si Que, however, are originally reading comprehension datasets. ... To turn these datasets into open-domain QA datasets, we create a corpora for each dataset by combining all the paragraphs in the train, dev and test questions.
Dataset Splits Yes We manually annotate Co Ts and decompositions for 20 training set questions, and sample 3 prompts of 15 questions each for all approaches. The detailed prompts are given in the Appendix G. We evaluate on 300 held-out dev questions in each dataset.
Hardware Specification No The paper specifies the LLM models used (e.g., 'text-davinci-002 Instruct GPT3 model', 'Codex (code-davinci-002) model', 'Flan-T5-Large', 'Flan-T5-XL', 'Flan-T5-XXL') but does not provide specific hardware details (like GPU models, CPU types, or memory) on which these models or the experiments were run.
Software Dependencies No The paper refers to specific LLM models (e.g., GPT3 text-davinci-002, Codex code-davinci-002, Flan-T5 family) but does not provide details on specific software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow versions, or other dependencies) required for replication.
Experiment Setup Yes For No Decomp-Ctxt, we search K {6, 8, 10} for GPT3 models and K 2, 4, 6, 8 for Flan-T5-* models. For Decomp-Ctxt, we search K {2, 4, 6} for GPT3 and Flan-T5-* models.