Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Latent State Models of Training Dynamics
Authors: Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To understand the effect of randomness on the dynamics and outcomes of neural network training, we train models multiple times with different random seeds and compute a variety of metrics throughout training, such as the L2 norm, mean, and variance of the neural network s weights. We then fit a hidden Markov model (HMM; Baum & Petrie, 1966) over the resulting sequences of metrics. ... we train HMMs on training trajectories derived from grokking tasks, language modeling, and image classification across a variety of model architectures and sizes. |
| Researcher Affiliation | Collaboration | Michael Y. Hu EMAIL New York University Angelica Chen EMAIL New York University Naomi Saphra EMAIL New York University Kyunghyun Cho EMAIL New York University Prescient Design, Genentech CIFAR LMB |
| Pseudocode | No | The paper describes the methodology and algorithms in prose and mathematical formulas, but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks or figures. |
| Open Source Code | Yes | Our code is available at https://github.com/michahu/modeling-training. |
| Open Datasets | Yes | We collect 40 runs of ResNet18 (He et al., 2016) trained on CIFAR-100 (Krizhevsky, 2009)... The dynamics of MNIST are similar to that of CIFAR-100. We collect 40 training runs of a two-layer MLP learning image classification on MNIST, with hyperparameters based on Simard et al. (2003). |
| Dataset Splits | Yes | We collect trajectories using 40 random seeds and train and validate the HMM on a random 80-20 validation split, a split that we use for all settings. ... Training data size 50000 (splits downloaded from PyTorch) ... Training data size 60000 (splits downloaded from PyTorch) |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processors, or cloud instance types used for running the experiments. It only mentions general training processes. |
| Software Dependencies | No | The paper mentions software components like "PyTorch" and optimizers like "Adam W" and "SGD," but it does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For all hyperparameter details, see Appendix D. ... Appendix D Training Hyperparameters: Hyperparameter Value Learning Rate 1e-1 Batch Size 32 Training data size (randomly generated) 1000 Architecture Multilayer perceptron Number of hidden layers 1 Model Hidden Size 128 Weight Decay 0.01 Seed 0 through 40 Optimizer SGD |