Recitation-Augmented Language Models

Authors: Zhiqing Sun, Xuezhi Wang, Yi Tay, Yiming Yang, Denny Zhou

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we verify the effectiveness of RECITE on four pre-trained models (Pa LM, UL2, OPT, and Codex) and three CBQA tasks (Natural Questions, Trivia QA, and Hotpot QA).
Researcher Affiliation Collaboration 1Google Research, Brain Team 2Language Technologies Institute, Carnegie Mellon University
Pseudocode Yes Algorithm 1 Per-question Error Analysis
Open Source Code Yes Our code is available at https://github.com/Edward-Sun/RECITE.
Open Datasets Yes The three evaluation datasets used in our experiments (Natural Questions2, Trivia QA3, and Hotpot QA4) are all publicly accessible.
Dataset Splits Yes We use the test split for all tasks if the test split is available and has labels for evaluation, otherwise we use the dev split.
Hardware Specification Yes We train Pa LM in the constructed corpus for 10,000 steps with a batch size of 64, which takes approximately 1 day in 64 TPUv4 chips5.
Software Dependencies No The paper mentions models like UL2 and OPT and tools like Sentence Piece, but does not provide specific version numbers for ancillary software dependencies such as Python, PyTorch, or other libraries used for implementation.
Experiment Setup Yes We evaluate our methods in 5-shot and 64-shot settings... We train Pa LM in the constructed corpus for 10,000 steps with a batch size of 64... We mainly follow Chowdhery et al. (2022) and use two new line symbols \n\n as the separator between different components within exemplars, and use three new line symbol \n\n\n as the separator between different exemplars.