Clustering in Causal Attention Masking

Authors: Nikita Karagodin, Yury Polyanskiy, Philippe Rigollet

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This work is a combination of rigorous mathematical results and non-trivial predictions based on analytical insights and numerical simulations.
Researcher Affiliation Academia Nikita Karagodin Yury Polyanskiy Philippe Rigollet Laboratory for Information and Decision Systems, MIT, Cambridge, MA, USA Laboratory for Information and Decision Systems, MIT, Cambridge, MA, USA Department of Mathematics, MIT, Cambridge, MA, USA
Pseudocode No The paper presents mathematical equations and derivations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The NeurIPS Paper Checklist states for Question 5: "The answer NA means that paper does not include experiments requiring code." This indicates that the paper does not provide open-source code for its methodology.
Open Datasets No The paper describes theoretical models and numerical simulations of particle dynamics (e.g., "n = 32 particles initialized uniformly at random on the sphere") but does not utilize a specific, named dataset or provide access information for any dataset used for training or evaluation.
Dataset Splits No The paper describes theoretical models and numerical simulations but does not mention the use of validation splits. There is no indication of empirical evaluation on a dataset with standard splits.
Hardware Specification No The paper describes numerical simulations but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run these simulations.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or specialized solvers) used for its numerical simulations or derivations.
Experiment Setup Yes In all cases we take simple Query and Key matrices K = Q = Id, temperature β = 9 and final time T = 5000 for n = 32 particles initialized uniformly at random on the sphere. Positions of particles at time T are indicated by a red dot. ... Evolution of the system (CSA) with K = Q = V = I2 with n = 200, d = 2, β = 64, strong Rényi centers (red) and Rényi centers (black) with δ = 4β 1/2.