BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

Authors: Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that on five knowledge-intensive NLP tasks, BTR accelerates state-of-the-art retrieval-augmented language model inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance.
Researcher Affiliation Academia Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi Paul G. Allen School of Computer Science & Engineering University of Washington {qicao,sewon,yizhongw,hannaneh}@cs.washington.edu
Pseudocode Yes A.1 TOKEN COMPRESSION ALGORITHM Algorithm 1 Offline Compression for Binary Token Representations ... Algorithm 2 Runtime Compression
Open Source Code Yes 1Our code is publicly available at https://github.com/csarron/BTR
Open Datasets Yes We evaluate BTR and baselines on three open-domain QA tasks: Natural Questions (NQ, Kwiatkowski et al. (2019)), Trivia QA (TQA, Joshi et al. (2017)), Web Questions (WQ, Berant et al. (2013)); one fact-checking task: FEVER (Thorne et al., 2018), and one knowledge-intensive reasoning benchmark: the mass-multitask language understanding (MMLU) dataset (Hendrycks et al., 2020).
Dataset Splits Yes Table 5: Statistics of the number of examples for the evaluation datasets. NQ Train 79168 Validation 8757 Test 3610
Hardware Specification Yes We conducted training using 4 to 8 A40 or A100 GPUs (depending on their availability on our cluster) with BF16 mixed precision.
Software Dependencies Yes We develop BTR based on the Atlas codebase using Py Torch 1.13.1 and Hugging Face Transformers v4.18.0 (Wolf et al., 2020).
Experiment Setup Yes Table 4: Training hyperparameters for BTR-Atlas. NQ TQA WQ Fever MMLU Hyperparameters base large base large base large base large base large batch size 8 4 8 4 8 4 8 4 4 2 learning rate 6e-5 6e-5 4e-5 4e-5 8e-5 8e-5 6e-5 6e-5 5e-5 5e-6 training steps 20000 20000 20000 20000 3000 3000 10000 10000 2000 2000 warmup steps 200 200 200 200 200 200 200 200 50 50 weight decay 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 number of passages 40 40 40 40 40 40 40 40 30 30 max query length 40 40 64 64 40 40 40 40 256 256 max passage length 320 320 320 320 320 320 320 320 320 320 max answer length 32 32 32 32 32 32 32 32 32 32