reproducibilityindex.ai

Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

Authors: Thomas Merth, Qichen Fu, Mohammad Rastegari, Mahyar Najibi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the capability of our method to simultaneously enhance time efficiency across a variety of questionanswering benchmarks using multiple pre-trained LLMs. Furthermore, our technique significantly improves accuracy when the retrieved context is large relative the context the model was trained on. We perform experiments on three families of large language models, namely Open ELM (Mehta et al., 2024), BLOOMZ (Muennighoff et al., 2023), and MPT (Mosaic ML NLP Team, 2023). We leverage the publicly available Natural Questions-Open (Liu et al., 2023a) and Mu Si Que (Trivedi et al., 2022) datasets.
Researcher Affiliation	Industry	1Apple, Cupertino, CA, USA 2Meta, Menlo x Park, CA, USA (*Work done while at Apple). Correspondence to: T. Merth <tmerth@apple.com>, Q. Fu <qfu22@apple.com>, M. Rastegari <mrastegari@meta.com>, M. Najibi <najibi@apple.com>.
Pseudocode	Yes	Please refer to Algorithm 3 for an algorithmic formalization.
Open Source Code	Yes	For reproducibility, our implementation can be found at https://github.com/apple/ml-superposition-prompting.
Open Datasets	Yes	We leverage the publicly available Natural Questions-Open (Liu et al., 2023a) and Mu Si Que (Trivedi et al., 2022) datasets.
Dataset Splits	Yes	We validate our approach on the dev split of Mu Si Que-Ans (reporting Answer EM and F1). We follow the same experimental setup as Liu et al., 2023a, including the same preprocessing and evaluation methodology for the 20 document setting (reporting Best EM Subspan, or Accuracy for short).
Hardware Specification	Yes	In Table 5 and Table 7, we present measurements of the compared methods in a realistic server deployment scenario (an NVIDIA A100 80GB).
Software Dependencies	No	We use the fvcore (facebookresearch, 2024) package to compute theoretical floating point operation (FLOP) counts for various inference settings. Our CUDA implementation is written in pure Py Torch. While these software components are mentioned, specific version numbers for them (e.g., PyTorch 1.x) are not provided.
Experiment Setup	Yes	We use greedy autoregressive decoding in all experiments, and randomize the order of documents to prevent any systematic bias possible due to location of the gold documents ( a la Liu et al., 2023a). We introduce the hyperparameter superposition factor as a parameter to interpolate between a fully superimposed and fully classical prompt. Here, we sweep values for top-k for our method, where k are the number of documents retained for generating the answer (full table results are provided in Table 5).