Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Authors: Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We curate a suite of 10 datasets containing over 40,000 prompts to study the task of predicting factual errors with the Llama-2 family across all scales (7B, 13B, 70B). We propose SAT Probe, a method probing attention patterns, that can predict factual errors and fine-grained constraint satisfaction, and allow early error identification. The approach and findings take another step towards using the mechanistic understanding of LLMs to enhance their reliability.1 |
| Researcher Affiliation | Collaboration | Mert Yuksekgonul Stanford University Varun Chandrasekaran University of Illinois Urbana-Champaign Erik Jones UC Berkeley Suriya Gunasekar Ranjita Naik Hamid Palangi Ece Kamar Besmira Nushi Microsoft Research |
| Pseudocode | No | The paper presents mathematical definitions of attention (e.g., Equation 1, 2) but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our datasets, evaluation protocol, and methods will be released at https://github.com/ microsoft/mechanistic-error-probe. |
| Open Datasets | Yes | For single-constraint queries, we curate 4 datasets using Wiki Data and 3 datasets using the existing Counter Fact dataset (Meng et al., 2022). We further designed three 2-constraint datasets, using Wiki Data (Books), Opendatasoft (2023) (Nobel Winners), or hand-curation (Words). |
| Dataset Splits | Yes | For each dataset, we split the dataset into two sets (train and test) and normalize each feature to zero mean and unit variance using the training split. We train a Logistic Regressor with C = 0.05 L1 regularization on one subset and evaluate the performance on the other subset. We repeat this experiment with 10 random seeds and report the mean performance and the standard error next to it in Table 5 for each dataset and method. |
| Hardware Specification | Yes | We perform our experiments on a single NVIDIA A100-PCIE-80GB GPU. |
| Software Dependencies | Yes | We use the Llama-2 family models released in Touvron et al. (2023) through the Hugging Face Transformers library (Wolf et al., 2019). To fit Llama-2 70B in a single A100 GPU with 80GBs of memory, we use 8-bit quantization with bitsandbytes (Dettmers et al., 2022b;a). Lasso regression in scikit-learn (Pedregosa et al., 2011). We use GPT 3.5 (gpt-3.5-turbo endpoint) in the loop... |
| Experiment Setup | Yes | For all of the models, we sample from the model using greedy decoding and temperature 0. For each dataset, we split the dataset into two sets (train and test) and normalize each feature to zero mean and unit variance using the training split. We train a Logistic Regressor with C = 0.05 L1 regularization on one subset and evaluate the performance on the other subset. We repeat this experiment with 10 random seeds and report the mean performance and the standard error next to it in Table 5 for each dataset and method. |