reproducibilityindex.ai

Decomposed Prompting: A Modular Approach for Solving Complex Tasks

Authors: Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To illustrate these advantages of DECOMP, we empirically evaluate it against prior work on eight challenging datasets using GPT3 models
Researcher Affiliation	Collaboration	Allen Institute for AI Stony Brook University University of Edinburgh tushark@allenai.org, hjtrivedi@cs.stonybrook.edu, matthewf@allenai.org, yao.fu@ed.ac.uk, kyler@allenai.org, peterc@allenai.org, ashishs@allenai.org
Pseudocode	Yes	Algorithm 1 A recursive reversal strategy that splits the sequence in half, reverses each half, and concatenates them. Runs in O(log n) calls to the LM where n is the number of items in the sequence. procedure SPLITREVERSE(x)
Open Source Code	Yes	Datasets, Code and Prompts available at https://github.com/allenai/DecomP.
Open Datasets	Yes	We use Hotpot QA in the fullwiki setting where it comes with the associated Wikipedia corpus for open-domain QA. 2Wiki Multihop QA and Mu Si Que, however, are originally reading comprehension datasets. ... To turn these datasets into open-domain QA datasets, we create a corpora for each dataset by combining all the paragraphs in the train, dev and test questions.
Dataset Splits	Yes	We manually annotate Co Ts and decompositions for 20 training set questions, and sample 3 prompts of 15 questions each for all approaches. The detailed prompts are given in the Appendix G. We evaluate on 300 held-out dev questions in each dataset.
Hardware Specification	No	The paper specifies the LLM models used (e.g., 'text-davinci-002 Instruct GPT3 model', 'Codex (code-davinci-002) model', 'Flan-T5-Large', 'Flan-T5-XL', 'Flan-T5-XXL') but does not provide specific hardware details (like GPU models, CPU types, or memory) on which these models or the experiments were run.
Software Dependencies	No	The paper refers to specific LLM models (e.g., GPT3 text-davinci-002, Codex code-davinci-002, Flan-T5 family) but does not provide details on specific software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow versions, or other dependencies) required for replication.
Experiment Setup	Yes	For No Decomp-Ctxt, we search K {6, 8, 10} for GPT3 models and K 2, 4, 6, 8 for Flan-T5-* models. For Decomp-Ctxt, we search K {2, 4, 6} for GPT3 and Flan-T5-* models.