reproducibilityindex.ai

Neural networks with late-phase weights

Authors: Johannes von Oswald, Seijin Kobayashi, Joao Sacramento, Alexander Meulemans, Christian Henning, Benjamin F Grewe

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show that augmenting standard models with late-phase weights improves generalization in established benchmarks such as CIFAR-10/100, Image Net and enwik8. These ﬁndings are complemented with a theoretical analysis of a noisy quadratic problem which provides a simpliﬁed picture of the late phases of neural network learning.
Researcher Affiliation	Academia	Institute of Neuroinformatics University of Z urich and ETH Z urich Z urich, Switzerland {voswaldj,seijink,rjoao,ameulema,henningc,bgrewe}@ethz.ch
Pseudocode	Yes	Algorithm 1: Late-phase learning Require: Base weights θ, late-phase weight set Φ, dataset D, gradient scale factor γθ, loss L Require: Training iteration t > T0 for 1 k K do Mk Sample minibatch from D θk θ L(Mk, θ, φk) φk Uφ(φk, φk L(Mk, θ, φk)) θ Uθ(θ, γθ PK k=1 θk)
Open Source Code	Yes	We provide code to reproduce our experiments at https://github.com/seijin-kobayashi/ late-phase-weights
Open Datasets	Yes	To test the applicability of our method to more realistic problems, we next augment standard neural network models with late-phase weights and examine their performance on the CIFAR-10 and CIFAR-100 image classiﬁcation benchmarks (Krizhevsky, 2009). ... train deep residual networks (He et al., 2016) and a densely-connected convolutional network (Dense Net; Huang et al., 2018) on the Image Net dataset (Russakovsky et al., 2015). ... experiments on the language modeling benchmark enwik8.
Dataset Splits	No	The paper mentions 'Validation set acc. (%) on Image Net' but does not provide specific details on how the training, validation, and test sets were split (e.g., exact percentages, sample counts, or explicit references to predefined validation splits for all datasets).
Hardware Specification	Yes	We used a single NVIDIA Ge Force 2080 Ti GPU for the experiment.
Software Dependencies	Yes	The result was computed in Python 3.7, using the automatic differentiation and GPU acceleration package Py Torch (version 1.4.0).
Experiment Setup	Yes	Throughout our CIFAR-10/100 experiments we set K = 10, use a fast base gradient scale factor of γθ = 1, and set our late-phase initialization hyperparameters to T0 = 120 (measured henceforth in epochs; T0 = 100 for SWA) and do not use initialization noise, σ0 = 0.