reproducibilityindex.ai

The emergence of clusters in self-attention dynamics

Authors: Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Viewing Transformers as interacting particle systems, we describe the geometry of learned representations when the weights are not time-dependent. We show that particles, representing tokens, tend to cluster toward particular limiting objects as time tends to inﬁnity. ... Using techniques from dynamical systems and partial differential equations, we show that the type of limiting object that emerges depends on the spectrum of the value matrix. Additionally, in the one-dimensional case we prove that the selfattention matrix converges to a low-rank Boolean matrix.
Researcher Affiliation	Academia	Borjan Geshkovski MIT borjan@mit.edu Cyril Letrouit MIT letrouit@mit.edu Yury Polyanskiy MIT yp@mit.edu Philippe Rigollet MIT rigollet@mit.edu
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	No	The paper primarily focuses on theoretical analysis of Transformer dynamics, supported by numerical illustrations, rather than empirical evaluation on publicly available datasets in a traditional machine learning sense. No public dataset is referenced with access information for their own work.
Dataset Splits	No	The paper's focus is on theoretical analysis and numerical illustrations of dynamics, not on training and evaluating a machine learning model with standard dataset splits. Therefore, it does not specify training, validation, or test splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the numerical illustrations.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks like Python, PyTorch, TensorFlow) used for the numerical illustrations.
Experiment Setup	No	While some parameters for numerical illustrations are mentioned (e.g., "n=40 tokens, with Q=K=1 and V=1"), the paper does not provide comprehensive experimental setup details like specific hyperparameters (learning rates, batch sizes, optimizers) or system-level training settings as would be typical for machine learning experiments.