reproducibilityindex.ai

Identifying Equivalent Training Dynamics

Authors: William Redman, Juan Bello-Rivas, Maria Fonoberova, Ryan Mohr, Yannis Kevrekidis, Igor Mezic

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate our approach, we demonstrate that comparing Koopman eigenvalues can correctly identify a known equivalence between online mirror descent and online gradient descent. We then utilize our approach to: (a) identify non-conjugate training dynamics between shallow and wide fully connected neural networks; (b) characterize the early phase of training dynamics in convolutional neural networks; (c) uncover non-conjugate training dynamics in Transformers that do and do not undergo grokking.
Researcher Affiliation	Collaboration	William T. Redman AIMdyn Inc. UC Santa Barbara Juan Bello-Rivas Johns Hopkins University Maria Fonoberova AIMdyn Inc. Ryan Mohr AIMdyn Inc. Yannis G. Kevrekidis Johns Hopkins University Igor Mezic AIMdyn Inc. UC Santa Barbara
Pseudocode	Yes	Algorithm 1 Online Mirror Descent [26] 0: Input: x(0) K, R, η, f 0: for t = 0, ..., T 1 do 0: y(t+1) = ( R) 1 ( R[x(t)] η f[x(t)]) 0: x(t + 1) = ΠR K[y(t + 1)]
Open Source Code	Yes	Code implementing our experiments can be found at https://github.com/william-redman/Identifying_Equivalent_Training_Dynamics.
Open Datasets	Yes	FCNs with only a single hidden layer, trained on MNIST (Appendix C.1). ... Le Net [60], a simple CNN trained on MNIST, and Res Net-20 [61], trained on CIFAR-10 (see Appendix D.1 for details). ... Transformers trained on algorithmic data (e.g., modular addition)...
Dataset Splits	No	The paper mentions training and testing but does not explicitly provide details about the train/validation/test dataset splits with percentages or sample counts.
Hardware Specification	Yes	All experiments were run on a Mac Book Air with an Apple M1 chip, 1 CPU, and no GPUs.
Software Dependencies	No	The paper mentions the use of PyTorch and links to external codebases (Shrink Bench framework, Omnigrok) but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Table S1: Hyper-parameters used for FCN training in Sec. 4.2. Hyper-parameters: Learning rate (η) 0.1, Batch size (b) 60, Optimizer SGD, Epochs 1, Activation function Re LU. ... Table S2: Hyper-parameters used for CNN training in Sec. 4.3. Hyper-parameters: Learning rate (η) 0.0012, Batch size (b) 60, Optimizer Adam, Epochs 20, Activation function Re LU.