Ask Me Anything: A simple strategy for prompting language models

Authors: Simran Arora, Avanika Narayan, Mayee F Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Christopher Re

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate AMA across open-source model families (Eleuther AI, BLOOM, OPT, and T0) and sizes (125M-175B parameters), demonstrating an average performance lift of 10.2% over the few-shot baseline.
Researcher Affiliation Collaboration Simran Arora*, Avanika Narayan* , Mayee Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Christopher RĂ© {simarora,avanika,mfchen,lorr1,nguha,kushb,chrismre}@cs.stanford.edu {ines.chami}@numbersstation.ai
Pseudocode Yes Algorithms summarizing the end-to-end AMA procedure are in Appendices D and E respectively.
Open Source Code Yes We release our code here: https://github.com/HazyResearch/ama_prompting.
Open Datasets Yes We evaluate using the same tasks on which GPT-3 was originally evaluated: Super GLUE (Wang et al., 2019), NLI (Mostafazadeh et al., 2017; Nie et al., 2020), classification (Zhang et al., 2015; Socher et al., 2013; He & Mc Auley, 2016), and QA tasks (Kasai et al., 2022; Kwiatkowski et al., 2019; Berant et al., 2013; Dua et al., 2019).
Dataset Splits Yes For each task, we use an unlabeled dataset constructed from the test set as well as 1000 additional unlabeled samples from the training set.
Hardware Specification Yes We use A100 NVidia GPUs to run all experiments.
Software Dependencies No The paper mentions downloading models from Hugging Face Model Hub (Hugging Face, 2021) and using the Open AI API davinci endpoint (Open AI, 2021), but it does not specify software versions for these tools or other key components.
Experiment Setup No The paper states 'For AMA we use 3-6 prompt()-chains to generate predictions per input.' and mentions 'the few-shot (k = 3) baseline', but it does not provide specific hyperparameters like learning rate, batch size, or optimization settings typically found in an 'Experimental Setup' section.