C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

Authors: Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our intensive empirical results demonstrate the soundness and tightness of our conformal generation risk guarantees across four widely-used NLP datasets on four state-of-the-art retrieval models.
Researcher Affiliation Collaboration 1University of Illinois at Urbana-Champaign, USA 2Delft University of Technology, Netherlands 3Netflix Eyeline Studios, USA 4University of California, Berkeley, USA 5University of Chicago, USA.
Pseudocode Yes We refer to Alg. 1 in App. C.1 for the pseudocode of the protocol.
Open Source Code Yes The codes are publicly available at https://github. com/kangmintong/C-RAG.
Open Datasets Yes We evaluate C-RAG on four widely used NLP datasets, including AESLC (Zhang & Tetreault, 2019), Common Gen (Lin et al., 2019), DART (Nan et al., 2020), and E2E (Novikova et al., 2017).
Dataset Splits Yes We perform conformal calibration on validation sets with uncertainty δ = 0.1.
Hardware Specification No No specific hardware (e.g., GPU models, CPU types, or memory) used for experiments is mentioned.
Software Dependencies No The paper mentions using “Llama-2-7b for inference” but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes We use our generation protocol (Alg. 1 in App. C.1) controlled by the number of retrieved examples Nrag, generation set size λg, and diversity threshold λs. We use Llama-2-7b for inference and perform conformal calibration on validation sets with uncertainty δ = 0.1. We use 1 ROUGE-L as the risk function. See App. J.1 for more details of evaluation setup.