The emergence of clusters in self-attention dynamics
Authors: Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Viewing Transformers as interacting particle systems, we describe the geometry of learned representations when the weights are not time-dependent. We show that particles, representing tokens, tend to cluster toward particular limiting objects as time tends to infinity. ... Using techniques from dynamical systems and partial differential equations, we show that the type of limiting object that emerges depends on the spectrum of the value matrix. Additionally, in the one-dimensional case we prove that the selfattention matrix converges to a low-rank Boolean matrix. |
| Researcher Affiliation | Academia | Borjan Geshkovski MIT borjan@mit.edu Cyril Letrouit MIT letrouit@mit.edu Yury Polyanskiy MIT yp@mit.edu Philippe Rigollet MIT rigollet@mit.edu |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm". |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper primarily focuses on theoretical analysis of Transformer dynamics, supported by numerical illustrations, rather than empirical evaluation on publicly available datasets in a traditional machine learning sense. No public dataset is referenced with access information for their own work. |
| Dataset Splits | No | The paper's focus is on theoretical analysis and numerical illustrations of dynamics, not on training and evaluating a machine learning model with standard dataset splits. Therefore, it does not specify training, validation, or test splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the numerical illustrations. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks like Python, PyTorch, TensorFlow) used for the numerical illustrations. |
| Experiment Setup | No | While some parameters for numerical illustrations are mentioned (e.g., "n=40 tokens, with Q=K=1 and V=1"), the paper does not provide comprehensive experimental setup details like specific hyperparameters (learning rates, batch sizes, optimizers) or system-level training settings as would be typical for machine learning experiments. |