reproducibilityindex.ai

LCA: Loss Change Allocation for Neural Network Training

Authors: Janice Lan, Rosanne Liu, Hattie Zhou, Jason Yosinski

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We employ the LCA approach to examine training on two tasks: MNIST and CIFAR-10, with architectures including a 3-layer fully connected (FC) network and Le Net [18] on MNIST, and All CNN [29] and Res Net-20 [9] on CIFAR-10. Figure 2: Frames from an animation of the learning process for two training runs.
Researcher Affiliation	Industry	Janice Lan Uber AI janlan@uber.com Rosanne Liu Uber AI rosanne@uber.com Hattie Zhou Uber hattie@uber.com Jason Yosinski Uber AI yosinski@uber.com
Pseudocode	No	The paper describes the LCA computation mathematically (Equation 1, 2, 3) and textually, but no pseudocode or algorithm blocks are provided.
Open Source Code	Yes	We also make our code available at https://github.com/uber-research/ loss-change-allocation.
Open Datasets	Yes	We employ the LCA approach to examine training on two tasks: MNIST and CIFAR-10, with architectures including a 3-layer fully connected (FC) network and Le Net [18] on MNIST, and All CNN [29] and Res Net-20 [9] on CIFAR-10.
Dataset Splits	No	We analyze the loss landscape of the training set instead of the validation set because we aim to measure training, not training confounded with issues of memorization vs. generalization (though the latter certainly should be the topic of future studies).
Hardware Specification	No	The paper mentions that the method is computationally intensive (“considerably slower than the regular training process”) and refers to Section S8 for more details on computation, but the main text does not specify any particular hardware used for experiments.
Software Dependencies	No	The paper does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions are not mentioned).
Experiment Setup	Yes	For each dataset network conﬁguration, we train with both SGD and Adam optimizers, and conduct multiple runs with identical hyperparameter settings. Momentum of 0.9 is used for all SGD runs, except for one set of no-momentum MNIST FC experiments. Learning rates are manually chosen between 0.001 to 0.5. See Section S7 in Supplementary Information for more details on architectures and hyperparameters.