C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
Authors: Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our intensive empirical results demonstrate the soundness and tightness of our conformal generation risk guarantees across four widely-used NLP datasets on four state-of-the-art retrieval models. |
| Researcher Affiliation | Collaboration | 1University of Illinois at Urbana-Champaign, USA 2Delft University of Technology, Netherlands 3Netflix Eyeline Studios, USA 4University of California, Berkeley, USA 5University of Chicago, USA. |
| Pseudocode | Yes | We refer to Alg. 1 in App. C.1 for the pseudocode of the protocol. |
| Open Source Code | Yes | The codes are publicly available at https://github. com/kangmintong/C-RAG. |
| Open Datasets | Yes | We evaluate C-RAG on four widely used NLP datasets, including AESLC (Zhang & Tetreault, 2019), Common Gen (Lin et al., 2019), DART (Nan et al., 2020), and E2E (Novikova et al., 2017). |
| Dataset Splits | Yes | We perform conformal calibration on validation sets with uncertainty δ = 0.1. |
| Hardware Specification | No | No specific hardware (e.g., GPU models, CPU types, or memory) used for experiments is mentioned. |
| Software Dependencies | No | The paper mentions using “Llama-2-7b for inference” but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We use our generation protocol (Alg. 1 in App. C.1) controlled by the number of retrieved examples Nrag, generation set size λg, and diversity threshold λs. We use Llama-2-7b for inference and perform conformal calibration on validation sets with uncertainty δ = 0.1. We use 1 ROUGE-L as the risk function. See App. J.1 for more details of evaluation setup. |