reproducibilityindex.ai

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

Authors: Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that on five knowledge-intensive NLP tasks, BTR accelerates state-of-the-art retrieval-augmented language model inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance.
Researcher Affiliation	Academia	Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi Paul G. Allen School of Computer Science & Engineering University of Washington {qicao,sewon,yizhongw,hannaneh}@cs.washington.edu
Pseudocode	Yes	A.1 TOKEN COMPRESSION ALGORITHM Algorithm 1 Offline Compression for Binary Token Representations ... Algorithm 2 Runtime Compression
Open Source Code	Yes	1Our code is publicly available at https://github.com/csarron/BTR
Open Datasets	Yes	We evaluate BTR and baselines on three open-domain QA tasks: Natural Questions (NQ, Kwiatkowski et al. (2019)), Trivia QA (TQA, Joshi et al. (2017)), Web Questions (WQ, Berant et al. (2013)); one fact-checking task: FEVER (Thorne et al., 2018), and one knowledge-intensive reasoning benchmark: the mass-multitask language understanding (MMLU) dataset (Hendrycks et al., 2020).
Dataset Splits	Yes	Table 5: Statistics of the number of examples for the evaluation datasets. NQ Train 79168 Validation 8757 Test 3610
Hardware Specification	Yes	We conducted training using 4 to 8 A40 or A100 GPUs (depending on their availability on our cluster) with BF16 mixed precision.
Software Dependencies	Yes	We develop BTR based on the Atlas codebase using Py Torch 1.13.1 and Hugging Face Transformers v4.18.0 (Wolf et al., 2020).
Experiment Setup	Yes	Table 4: Training hyperparameters for BTR-Atlas. NQ TQA WQ Fever MMLU Hyperparameters base large base large base large base large base large batch size 8 4 8 4 8 4 8 4 4 2 learning rate 6e-5 6e-5 4e-5 4e-5 8e-5 8e-5 6e-5 6e-5 5e-5 5e-6 training steps 20000 20000 20000 20000 3000 3000 10000 10000 2000 2000 warmup steps 200 200 200 200 200 200 200 200 50 50 weight decay 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 number of passages 40 40 40 40 40 40 40 40 30 30 max query length 40 40 64 64 40 40 40 40 256 256 max passage length 320 320 320 320 320 320 320 320 320 320 max answer length 32 32 32 32 32 32 32 32 32 32