ContextCite: Attributing Model Generation to Context

Authors: Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, Aleksander Madry

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate CONTEXTCITE on up to 1, 000 random validation examples from each of three representative benchmarks: 1. Ty Di QA [30] is a question-answering dataset in which the context is an entire Wikipedia article. 2. Hotpot QA [31] is a multi-hop question-answering dataset where answering the question requires reasoning over information from multiple documents. 3. CNN Daily Mail [28] is a dataset of news articles and headlines.
Researcher Affiliation Academia Benjamin Cohen-Wang , Harshay Shah , Kristian Georgiev , Aleksander M adry MIT {bencw,harshay,krisgrg,madry}@mit.edu
Pseudocode Yes Algorithm 1 CONTEXTCITE
Open Source Code Yes We provide code for CONTEXTCITE at https://github. com/Madry Lab/context-cite.
Open Datasets Yes Ty Di QA [30] is a question-answering dataset... Hotpot QA [31] is a multi-hop question-answering dataset... CNN Daily Mail [28] is a dataset of news articles and headlines... MS MARCO [67] is question-answering dataset... Natural Questions [29] is a question-answering dataset...
Dataset Splits No We evaluate CONTEXTCITE on up to 1, 000 random validation examples from each of three representative benchmarks... For each of these datasets, we evaluate the F1 score of instructiontuned Llama-3-8B (Figure 6) on 1, 000 randomly sampled examples from the validation set. While the paper mentions using validation sets and random samples from them, it does not provide explicit details of train/validation/test splits, such as percentages or counts, for the datasets used in their experiments.
Hardware Specification Yes We run all experiments on a cluster of A100 GPUs.
Software Dependencies No We use the scikit-learn [64] implementation of LASSO for CONTEXTCITE... We use the off-the-shelf sentence tokenizer from the nltk library [44]. Our implementation of CONTEXTCITE is available at https://github.com/Madry Lab/context-cite. We use the implementations of language models from Hugging Face s transformers library [66]. The paper mentions specific software libraries but does not provide version numbers for scikit-learn, nltk, or the transformers library.
Experiment Setup Yes We use the scikit-learn [64] implementation of LASSO for CONTEXTCITE, always with the regularization parameter alpha set to 0.01. When splitting the context into sources or splitting a response into statements, we use the off-the-shelf sentence tokenizer from the nltk library [44]. ... We evaluate CONTEXTCITE with {32, 64, 128, 256} context ablations.