Neural networks with late-phase weights
Authors: Johannes von Oswald, Seijin Kobayashi, Joao Sacramento, Alexander Meulemans, Christian Henning, Benjamin F Grewe
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results show that augmenting standard models with late-phase weights improves generalization in established benchmarks such as CIFAR-10/100, Image Net and enwik8. These findings are complemented with a theoretical analysis of a noisy quadratic problem which provides a simplified picture of the late phases of neural network learning. |
| Researcher Affiliation | Academia | Institute of Neuroinformatics University of Z urich and ETH Z urich Z urich, Switzerland {voswaldj,seijink,rjoao,ameulema,henningc,bgrewe}@ethz.ch |
| Pseudocode | Yes | Algorithm 1: Late-phase learning Require: Base weights θ, late-phase weight set Φ, dataset D, gradient scale factor γθ, loss L Require: Training iteration t > T0 for 1 k K do Mk Sample minibatch from D θk θ L(Mk, θ, φk) φk Uφ(φk, φk L(Mk, θ, φk)) θ Uθ(θ, γθ PK k=1 θk) |
| Open Source Code | Yes | We provide code to reproduce our experiments at https://github.com/seijin-kobayashi/ late-phase-weights |
| Open Datasets | Yes | To test the applicability of our method to more realistic problems, we next augment standard neural network models with late-phase weights and examine their performance on the CIFAR-10 and CIFAR-100 image classification benchmarks (Krizhevsky, 2009). ... train deep residual networks (He et al., 2016) and a densely-connected convolutional network (Dense Net; Huang et al., 2018) on the Image Net dataset (Russakovsky et al., 2015). ... experiments on the language modeling benchmark enwik8. |
| Dataset Splits | No | The paper mentions 'Validation set acc. (%) on Image Net' but does not provide specific details on how the training, validation, and test sets were split (e.g., exact percentages, sample counts, or explicit references to predefined validation splits for all datasets). |
| Hardware Specification | Yes | We used a single NVIDIA Ge Force 2080 Ti GPU for the experiment. |
| Software Dependencies | Yes | The result was computed in Python 3.7, using the automatic differentiation and GPU acceleration package Py Torch (version 1.4.0). |
| Experiment Setup | Yes | Throughout our CIFAR-10/100 experiments we set K = 10, use a fast base gradient scale factor of γθ = 1, and set our late-phase initialization hyperparameters to T0 = 120 (measured henceforth in epochs; T0 = 100 for SWA) and do not use initialization noise, σ0 = 0. |