The Linear Representation Hypothesis and the Geometry of Large Language Models
Authors: Kiho Park, Yo Joong Choe, Victor Veitch
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with LLa MA2 demonstrate the existence of linear representations of concepts, the connection to interpretation and control, and the fundamental role of the choice of inner product. |
| Researcher Affiliation | Academia | 1University of Chicago, Illinois, USA. |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present any structured, code-like procedures. |
| Open Source Code | Yes | Code is available at github.com/Kiho Park/linear rep geometry. |
| Open Datasets | Yes | The other concepts are based on The Bigger Analogy Test Set (BATS) (Gladkova et al., 2016), version 3.09, which is used for evaluation of the word analogy task. ... The pairs for 16th concept are based on the csv file8). |
| Dataset Splits | No | The paper describes the data collection for its experiments (e.g., counterfactual pairs, Wikipedia contexts) but does not specify any formal training, validation, or test data splits (e.g., percentages or sample counts) for these collected datasets. |
| Hardware Specification | No | The paper mentions using the LLaMA-2 model but does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud compute instances) used to run the experiments or analyses. |
| Software Dependencies | No | The paper mentions 'huggingface library' and 'Chat GPT-4' but does not provide specific version numbers for any software libraries, programming languages, or tools used in their implementation or experimentation. |
| Experiment Setup | Yes | We use the LLa MA-2 model with 7 billion parameters... We estimate γW as the (normalized) mean... We take λW := Cov(γ) 1 γW. ... We then intervene on λ(xj) using λC via λC,α(xj) = λ(xj) + α λC, where α > 0 and C can be W, Z, or some other causally separable concept (e.g., French Spanish). ... We discard the contexts such that Y (0, 0) is not the top 1 next word. |