Infinite Limits of Multi-head Transformer Dynamics

Authors: Blake Bordelon, Hamza Chaudhry, Cengiz Pehlevan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide numerical evidence of convergence to the limits and discuss how the parameterization qualitatively influences learned features.
Researcher Affiliation Academia Blake Bordelon, Hamza Chaudhry, Cengiz Pehlevan John A. Paulson School of Engineering and Applied Sciences Center for Brain Science Kempner Institute for the Study of Natural and Artificial Intelligence Harvard University Cambridge, MA 02138
Pseudocode Yes We provide an example FLAX implementation of the vision transformer and causal language model. ... class Attention ... class MLP_Block ... class Resid Block ... class VIT ... class Causal_Attention ... class LM_Transformer
Open Source Code Yes We provide code in the uploaded supplementary material. ... We provide an example FLAX implementation of the vision transformer and causal language model.
Open Datasets Yes vision transformers trained on CIFAR-5M over finite N at H = 16. ... training language models on a larger natural language dataset, a Transformer with causal attention blocks trained on the C4 dataset [43]
Dataset Splits No The paper mentions 'test loss' and 'test examples' but does not explicitly provide details about a validation dataset split, percentages, or sample counts for validation.
Hardware Specification Yes Each of the experimental runs performed in this paper were all performed on single NVIDIA H100 GPU.
Software Dependencies No We provide an example FLAX implementation of the vision transformer and causal language model. ... import flax.linen as nn ... import jax.numpy as jnp. No version numbers are provided for Flax, JAX, or Python.
Experiment Setup Yes The base model has (N, H, L) = (8, 8, 4) and (αL, β0, γ0) = (1, 4, 0.25) and αA ∈ {1, 1/2}. (a) Train loss dynamics after 10000 steps on C4 using Adam optimizer.