Ask Me Anything: A simple strategy for prompting language models
Authors: Simran Arora, Avanika Narayan, Mayee F Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Christopher Re
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate AMA across open-source model families (Eleuther AI, BLOOM, OPT, and T0) and sizes (125M-175B parameters), demonstrating an average performance lift of 10.2% over the few-shot baseline. |
| Researcher Affiliation | Collaboration | Simran Arora*, Avanika Narayan* , Mayee Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Christopher RĂ© {simarora,avanika,mfchen,lorr1,nguha,kushb,chrismre}@cs.stanford.edu {ines.chami}@numbersstation.ai |
| Pseudocode | Yes | Algorithms summarizing the end-to-end AMA procedure are in Appendices D and E respectively. |
| Open Source Code | Yes | We release our code here: https://github.com/HazyResearch/ama_prompting. |
| Open Datasets | Yes | We evaluate using the same tasks on which GPT-3 was originally evaluated: Super GLUE (Wang et al., 2019), NLI (Mostafazadeh et al., 2017; Nie et al., 2020), classification (Zhang et al., 2015; Socher et al., 2013; He & Mc Auley, 2016), and QA tasks (Kasai et al., 2022; Kwiatkowski et al., 2019; Berant et al., 2013; Dua et al., 2019). |
| Dataset Splits | Yes | For each task, we use an unlabeled dataset constructed from the test set as well as 1000 additional unlabeled samples from the training set. |
| Hardware Specification | Yes | We use A100 NVidia GPUs to run all experiments. |
| Software Dependencies | No | The paper mentions downloading models from Hugging Face Model Hub (Hugging Face, 2021) and using the Open AI API davinci endpoint (Open AI, 2021), but it does not specify software versions for these tools or other key components. |
| Experiment Setup | No | The paper states 'For AMA we use 3-6 prompt()-chains to generate predictions per input.' and mentions 'the few-shot (k = 3) baseline', but it does not provide specific hyperparameters like learning rate, batch size, or optimization settings typically found in an 'Experimental Setup' section. |