reproducibilityindex.ai

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

Authors: Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We curate a suite of 10 datasets containing over 40,000 prompts to study the task of predicting factual errors with the Llama-2 family across all scales (7B, 13B, 70B). We propose SAT Probe, a method probing attention patterns, that can predict factual errors and fine-grained constraint satisfaction, and allow early error identification. The approach and findings take another step towards using the mechanistic understanding of LLMs to enhance their reliability.1
Researcher Affiliation	Collaboration	Mert Yuksekgonul Stanford University Varun Chandrasekaran University of Illinois Urbana-Champaign Erik Jones UC Berkeley Suriya Gunasekar Ranjita Naik Hamid Palangi Ece Kamar Besmira Nushi Microsoft Research
Pseudocode	No	The paper presents mathematical definitions of attention (e.g., Equation 1, 2) but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Our datasets, evaluation protocol, and methods will be released at https://github.com/ microsoft/mechanistic-error-probe.
Open Datasets	Yes	For single-constraint queries, we curate 4 datasets using Wiki Data and 3 datasets using the existing Counter Fact dataset (Meng et al., 2022). We further designed three 2-constraint datasets, using Wiki Data (Books), Opendatasoft (2023) (Nobel Winners), or hand-curation (Words).
Dataset Splits	Yes	For each dataset, we split the dataset into two sets (train and test) and normalize each feature to zero mean and unit variance using the training split. We train a Logistic Regressor with C = 0.05 L1 regularization on one subset and evaluate the performance on the other subset. We repeat this experiment with 10 random seeds and report the mean performance and the standard error next to it in Table 5 for each dataset and method.
Hardware Specification	Yes	We perform our experiments on a single NVIDIA A100-PCIE-80GB GPU.
Software Dependencies	Yes	We use the Llama-2 family models released in Touvron et al. (2023) through the Hugging Face Transformers library (Wolf et al., 2019). To fit Llama-2 70B in a single A100 GPU with 80GBs of memory, we use 8-bit quantization with bitsandbytes (Dettmers et al., 2022b;a). Lasso regression in scikit-learn (Pedregosa et al., 2011). We use GPT 3.5 (gpt-3.5-turbo endpoint) in the loop...
Experiment Setup	Yes	For all of the models, we sample from the model using greedy decoding and temperature 0. For each dataset, we split the dataset into two sets (train and test) and normalize each feature to zero mean and unit variance using the training split. We train a Logistic Regressor with C = 0.05 L1 regularization on one subset and evaluate the performance on the other subset. We repeat this experiment with 10 random seeds and report the mean performance and the standard error next to it in Table 5 for each dataset and method.