reproducibilityindex.ai

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

Authors: Yue Yu, Wei Ping, Zihan Liu, Boxin Wang, Jiaxuan You, Chao Zhang, Mohammad Shoeybi, Bryan Catanzaro

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct comprehensive experiments on a variety of knowledge-intensive NLP tasks to demonstrate the zero-shot capabilities of Rank RAG. Table 2 presents results of Rank RAG and baselines.
Researcher Affiliation	Collaboration	Yue Yu Georgia Tech Wei Ping NVIDIA Zihan Liu NVIDIA Boxin Wang NVIDIA Jiaxuan You NVIDIA Chao Zhang Georgia Tech Mohammad Shoeybi NVIDIA Bryan Catanzaro NVIDIA
Pseudocode	No	The paper describes the Rank RAG method in Section 4, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We will open-source model weights and scripts for reproducing our results.
Open Datasets	Yes	We follow existing works (Chung et al., 2024; Wang et al., 2024; Liu et al., 2024) to first leverage SFT on a blend of high quality instruction following datasets, including: i) a private crowd-sourced conversational dataset and public conversation datasets: Open Assistant (Köpf et al., 2023), Dolly (Conover et al., 2023), and SODA (Kim et al., 2023), ii) a long-form QA dataset ELI5 that requires elaborate answers (Fan et al., 2019), iii) LLM-generated instructions: Self-Instruct (Wang et al., 2023b) and Unnatural Instructions (Honovich et al., 2023), iv) FLAN and Chain-of-thought datasets (Chung et al., 2024).
Dataset Splits	No	The paper states that there is 'no overlap between SFT data and data from evaluation tasks,' implying separate training and testing, but it does not explicitly define or specify dataset splits for training, validation, and testing with percentages or sample counts.
Hardware Specification	Yes	Training Rank RAG-8B uses 32 NVIDIA A100 GPUs for 10 hours (4 hours for Stage-I and 6 hours for Stage-II finetuning), while training Rank RAG-70B uses 128 NVIDIA A100 GPUs for 16 hours (4 hours for Stage-I and 12 hours for Stage-II Finetuning).
Software Dependencies	No	The paper mentions using 'Llama3 8B and 70B' as the backbone and 'Adam optimizer', but it does not specify software dependencies with version numbers like Python, PyTorch, or other libraries.
Experiment Setup	Yes	For the two-stage instruction tuning, we set the batch size to 128 and train the model for 1000 steps with learning rate 5e-6 in Stage-I. Then, we reduce the learning rate to 3e-7 for 8B and 2e-7 for 70B model, set the batch size to 64, and train the model for 3300 steps (around 1 epoch). We use the Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.98.