reproducibilityindex.ai

Recitation-Augmented Language Models

Authors: Zhiqing Sun, Xuezhi Wang, Yi Tay, Yiming Yang, Denny Zhou

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we verify the effectiveness of RECITE on four pre-trained models (Pa LM, UL2, OPT, and Codex) and three CBQA tasks (Natural Questions, Trivia QA, and Hotpot QA).
Researcher Affiliation	Collaboration	1Google Research, Brain Team 2Language Technologies Institute, Carnegie Mellon University
Pseudocode	Yes	Algorithm 1 Per-question Error Analysis
Open Source Code	Yes	Our code is available at https://github.com/Edward-Sun/RECITE.
Open Datasets	Yes	The three evaluation datasets used in our experiments (Natural Questions2, Trivia QA3, and Hotpot QA4) are all publicly accessible.
Dataset Splits	Yes	We use the test split for all tasks if the test split is available and has labels for evaluation, otherwise we use the dev split.
Hardware Specification	Yes	We train Pa LM in the constructed corpus for 10,000 steps with a batch size of 64, which takes approximately 1 day in 64 TPUv4 chips5.
Software Dependencies	No	The paper mentions models like UL2 and OPT and tools like Sentence Piece, but does not provide specific version numbers for ancillary software dependencies such as Python, PyTorch, or other libraries used for implementation.
Experiment Setup	Yes	We evaluate our methods in 5-shot and 64-shot settings... We train Pa LM in the constructed corpus for 10,000 steps with a batch size of 64... We mainly follow Chowdhery et al. (2022) and use two new line symbols \n\n as the separator between different components within exemplars, and use three new line symbol \n\n\n as the separator between different exemplars.