reproducibilityindex.ai

Infinite Limits of Multi-head Transformer Dynamics

Authors: Blake Bordelon, Hamza Chaudhry, Cengiz Pehlevan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide numerical evidence of convergence to the limits and discuss how the parameterization qualitatively influences learned features.
Researcher Affiliation	Academia	Blake Bordelon, Hamza Chaudhry, Cengiz Pehlevan John A. Paulson School of Engineering and Applied Sciences Center for Brain Science Kempner Institute for the Study of Natural and Artificial Intelligence Harvard University Cambridge, MA 02138
Pseudocode	Yes	We provide an example FLAX implementation of the vision transformer and causal language model. ... class Attention ... class MLP_Block ... class Resid Block ... class VIT ... class Causal_Attention ... class LM_Transformer
Open Source Code	Yes	We provide code in the uploaded supplementary material. ... We provide an example FLAX implementation of the vision transformer and causal language model.
Open Datasets	Yes	vision transformers trained on CIFAR-5M over finite N at H = 16. ... training language models on a larger natural language dataset, a Transformer with causal attention blocks trained on the C4 dataset [43]
Dataset Splits	No	The paper mentions 'test loss' and 'test examples' but does not explicitly provide details about a validation dataset split, percentages, or sample counts for validation.
Hardware Specification	Yes	Each of the experimental runs performed in this paper were all performed on single NVIDIA H100 GPU.
Software Dependencies	No	We provide an example FLAX implementation of the vision transformer and causal language model. ... import flax.linen as nn ... import jax.numpy as jnp. No version numbers are provided for Flax, JAX, or Python.
Experiment Setup	Yes	The base model has (N, H, L) = (8, 8, 4) and (αL, β0, γ0) = (1, 4, 0.25) and αA ∈ {1, 1/2}. (a) Train loss dynamics after 10000 steps on C4 using Adam optimizer.