reproducibilityindex.ai

Tensor Programs IIb: Architectural Universality Of Neural Tangent Kernel Training Dynamics

Authors: Greg Yang, Etai Littwin

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Yang (2020a) recently showed that the Neural Tangent Kernel (NTK) at initialization has an inﬁnite-width limit for a large class of architectures including modern staples such as Res Net and Transformers. However, their analysis does not apply to training. Here, we show the same neural networks (in the so-called NTK parametrization) during training follow a kernel gradient descent dynamics in function space, where the kernel is the inﬁnite-width NTK. This completes the proof of the architectural universality of NTK behavior. To achieve this result, we apply the Tensor Programs technique: Write the entire SGD dynamics inside a Tensor Program and analyze it via the Master Theorem.
Researcher Affiliation	Industry	1Microsoft Research 2Apple Research. Correspondence to: Greg Yang <gregyang@microsoft.com>, Etai Littwin <elittwin@apple.com>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not use datasets for training or evaluation.
Dataset Splits	No	The paper is theoretical and does not discuss dataset splits for validation.
Hardware Specification	No	The paper does not provide specific hardware details used for running experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	No	The paper does not contain specific experimental setup details like hyperparameter values or training configurations.