LCA: Loss Change Allocation for Neural Network Training
Authors: Janice Lan, Rosanne Liu, Hattie Zhou, Jason Yosinski
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We employ the LCA approach to examine training on two tasks: MNIST and CIFAR-10, with architectures including a 3-layer fully connected (FC) network and Le Net [18] on MNIST, and All CNN [29] and Res Net-20 [9] on CIFAR-10. Figure 2: Frames from an animation of the learning process for two training runs. |
| Researcher Affiliation | Industry | Janice Lan Uber AI janlan@uber.com Rosanne Liu Uber AI rosanne@uber.com Hattie Zhou Uber hattie@uber.com Jason Yosinski Uber AI yosinski@uber.com |
| Pseudocode | No | The paper describes the LCA computation mathematically (Equation 1, 2, 3) and textually, but no pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | We also make our code available at https://github.com/uber-research/ loss-change-allocation. |
| Open Datasets | Yes | We employ the LCA approach to examine training on two tasks: MNIST and CIFAR-10, with architectures including a 3-layer fully connected (FC) network and Le Net [18] on MNIST, and All CNN [29] and Res Net-20 [9] on CIFAR-10. |
| Dataset Splits | No | We analyze the loss landscape of the training set instead of the validation set because we aim to measure training, not training confounded with issues of memorization vs. generalization (though the latter certainly should be the topic of future studies). |
| Hardware Specification | No | The paper mentions that the method is computationally intensive (“considerably slower than the regular training process”) and refers to Section S8 for more details on computation, but the main text does not specify any particular hardware used for experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions are not mentioned). |
| Experiment Setup | Yes | For each dataset network configuration, we train with both SGD and Adam optimizers, and conduct multiple runs with identical hyperparameter settings. Momentum of 0.9 is used for all SGD runs, except for one set of no-momentum MNIST FC experiments. Learning rates are manually chosen between 0.001 to 0.5. See Section S7 in Supplementary Information for more details on architectures and hyperparameters. |