LCA: Loss Change Allocation for Neural Network Training

Authors: Janice Lan, Rosanne Liu, Hattie Zhou, Jason Yosinski

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We employ the LCA approach to examine training on two tasks: MNIST and CIFAR-10, with architectures including a 3-layer fully connected (FC) network and Le Net [18] on MNIST, and All CNN [29] and Res Net-20 [9] on CIFAR-10. Figure 2: Frames from an animation of the learning process for two training runs.
Researcher Affiliation Industry Janice Lan Uber AI janlan@uber.com Rosanne Liu Uber AI rosanne@uber.com Hattie Zhou Uber hattie@uber.com Jason Yosinski Uber AI yosinski@uber.com
Pseudocode No The paper describes the LCA computation mathematically (Equation 1, 2, 3) and textually, but no pseudocode or algorithm blocks are provided.
Open Source Code Yes We also make our code available at https://github.com/uber-research/ loss-change-allocation.
Open Datasets Yes We employ the LCA approach to examine training on two tasks: MNIST and CIFAR-10, with architectures including a 3-layer fully connected (FC) network and Le Net [18] on MNIST, and All CNN [29] and Res Net-20 [9] on CIFAR-10.
Dataset Splits No We analyze the loss landscape of the training set instead of the validation set because we aim to measure training, not training confounded with issues of memorization vs. generalization (though the latter certainly should be the topic of future studies).
Hardware Specification No The paper mentions that the method is computationally intensive (“considerably slower than the regular training process”) and refers to Section S8 for more details on computation, but the main text does not specify any particular hardware used for experiments.
Software Dependencies No The paper does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions are not mentioned).
Experiment Setup Yes For each dataset network configuration, we train with both SGD and Adam optimizers, and conduct multiple runs with identical hyperparameter settings. Momentum of 0.9 is used for all SGD runs, except for one set of no-momentum MNIST FC experiments. Learning rates are manually chosen between 0.001 to 0.5. See Section S7 in Supplementary Information for more details on architectures and hyperparameters.