Linearity of Relation Decoding in Transformer Language Models

Authors: Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, David Bau

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now empirically evaluate how well LREs, estimated using the approach from Section 3, can approximate relation decoding in LMs for a variety of different relations. In all of our experiments, we study autoregressive language models.
Researcher Affiliation Academia 1Massachusetts Institute of Technology, 2Northeastern University, 3Technion IIT, 4Harvard University.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code and dataset are available at lre.baulab.info.
Open Datasets Yes To support our evaluation, we manually curate a dataset of 47 relations spanning four categories: factual associations, commonsense knowledge, implicit biases, and linguistic knowledge. Each relation is associated with a number of example subject object pairs (si, oi), as well as a prompt template that leads the language model to predict o when s is filled in (e.g., [s] plays the). When evaluating each model, we filter the dataset to examples where the language model correctly predicts the object o given the prompt. Table 1 summarizes the dataset and filtering results. Further details on dataset construction are in Appendix A. The code and dataset are available at lre.baulab.info.
Dataset Splits No The paper mentions evaluating on "new subjects s" and selecting hyperparameters using "grid-search," which implies an internal data split for validation. However, it does not explicitly state the proportions or counts for train/validation/test splits of the dataset.
Hardware Specification Yes We ran all experiments on workstations with 80GB NVIDIA A100 GPUs or 48GB A6000 GPUs using Hugging Face Transformers (Wolf et al., 2019) implemented in Py Torch (Paszke et al., 2019).
Software Dependencies No The paper mentions "Hugging Face Transformers (Wolf et al., 2019) implemented in Py Torch (Paszke et al., 2019)". However, it does not specify version numbers for these software components, which is necessary for reproducibility.
Experiment Setup Yes We estimate LREs for each relation using the method discussed in Section 3 with n = 8. While calculating W and b for an individual example we prepend the remaining n - 1 training examples as few-shot examples so that the LM is more likely to generate the answer o given a s under the relation r over other plausible tokens. We fix the scalar term β (from Equation (4)) once per LM. We also have two hyperparameters specific to each relation r; ℓr, the layer after which s is to be extracted; and ρr, the rank of the inverse W (to check causality as in Equation (7)). We select these hyperparameters with grid-search; see Appendix E for details. For each relation, we report average results over 24 trials with distinct sets of n examples randomly drawn from the dataset.