reproducibilityindex.ai

Ask Me Anything: A simple strategy for prompting language models

Authors: Simran Arora, Avanika Narayan, Mayee F Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Christopher Re

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate AMA across open-source model families (Eleuther AI, BLOOM, OPT, and T0) and sizes (125M-175B parameters), demonstrating an average performance lift of 10.2% over the few-shot baseline.
Researcher Affiliation	Collaboration	Simran Arora, Avanika Narayan , Mayee Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Christopher Ré {simarora,avanika,mfchen,lorr1,nguha,kushb,chrismre}@cs.stanford.edu {ines.chami}@numbersstation.ai
Pseudocode	Yes	Algorithms summarizing the end-to-end AMA procedure are in Appendices D and E respectively.
Open Source Code	Yes	We release our code here: https://github.com/HazyResearch/ama_prompting.
Open Datasets	Yes	We evaluate using the same tasks on which GPT-3 was originally evaluated: Super GLUE (Wang et al., 2019), NLI (Mostafazadeh et al., 2017; Nie et al., 2020), classification (Zhang et al., 2015; Socher et al., 2013; He & Mc Auley, 2016), and QA tasks (Kasai et al., 2022; Kwiatkowski et al., 2019; Berant et al., 2013; Dua et al., 2019).
Dataset Splits	Yes	For each task, we use an unlabeled dataset constructed from the test set as well as 1000 additional unlabeled samples from the training set.
Hardware Specification	Yes	We use A100 NVidia GPUs to run all experiments.
Software Dependencies	No	The paper mentions downloading models from Hugging Face Model Hub (Hugging Face, 2021) and using the Open AI API davinci endpoint (Open AI, 2021), but it does not specify software versions for these tools or other key components.
Experiment Setup	No	The paper states 'For AMA we use 3-6 prompt()-chains to generate predictions per input.' and mentions 'the few-shot (k = 3) baseline', but it does not provide specific hyperparameters like learning rate, batch size, or optimization settings typically found in an 'Experimental Setup' section.