reproducibilityindex.ai

Fisher-Legendre (FishLeg) optimization of deep neural networks

Authors: Jezabel R Garcia, Federica Freddi, Stathi Fotiadis, Maolin Li, Sattar Vakili, Alberto Bernacchia, Guillaume Hennequin

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that the resulting Fisher-Legendre (Fish Leg) optimizer converges to a (global) minimum of non-convex functions satisfying the PL condition, which applies in particular to deep linear networks. On standard auto-encoder benchmarks, we show empirically that Fish Leg outperforms standard ﬁrst-order optimization methods, and performs on par with or better than other second-order methods, especially when using small batches.
Researcher Affiliation	Collaboration	Jezabel R Garcia1 , Federica Freddi1 , Stathi Fotiadis1, Maolin Li1, Sattar Vakili1, Alberto Bernacchia1 & Guillaume Hennequin1,2, 1. Media Tek Research, Cambourne Business Park, CB23 6DW, UK first.last@mtkresearch.com 2. Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK g.hennequin@eng.cam.ac.uk
Pseudocode	Yes	Algorithm 1 Fish Leg algorithm (online setting) is provided in Appendix A.1.
Open Source Code	Yes	Our code is available here on Git Hub.
Open Datasets	Yes	We applied Fish Leg to the auto-encoders benchmarks previously used to compare second-order optimization methods the details of these experiments (model architectures, datasets, etc) can be found in (Goldfarb et al., 2020). [...] Fish Leg performed similarly to KFAC and KBFGS on the FACES and MNIST datasets...
Dataset Splits	No	The paper mentions training loss and test error but does not explicitly specify a validation dataset split or its proportion/purpose. While hyperparameters are optimized, it is not clear if a dedicated validation set was used for this purpose, or if cross-validation was employed instead.
Hardware Specification	Yes	We ran a clean wallclock-time comparison between SGDm, KFAC and Fish Leg using a uniﬁed CPU-only implementation applied to the FACES and MNIST benchmarks. This ensured e.g. that the loss and its gradients were computed in exactly the same way across methods. Overall, one iteration of vanilla Fish Leg was 5 times slower than one iteration of SGDm. However, we were able to bring this down to only twice slower by updating λ every 10 iterations, which did not signiﬁcantly affect performance. Combined with Fish Leg s faster progress per-iteration, this meant that Fish Leg retained a signiﬁcant advantage in wall-clock time over SGD (Fig.3), similar to KFAC. In practice we think that it might make sense to update λ more frequently at the beginning of training, and let these updates become sparser as optimization progresses. CPU (Intel Xeon Platinum 8380H @ 2.90GHz) with Open BLAS compiled for that architecture and multi-threaded with Open MP (8 threads).
Software Dependencies	No	The paper mentions 'Open BLAS compiled for that architecture and multi-threaded with Open MP (8 threads)' but does not specify version numbers for these software components. Therefore, a fully reproducible description of ancillary software is not provided.
Experiment Setup	Yes	Table 1: Optimal hyperparameter values for Fish Leg, identiﬁed as the result of a grid search over the space shown in Table 2. These hyperparameters were chosen to minimise the training loss. [...] Parameters: minibatch size = 40, η = 0.04, α = 0.001, β = 0.9, ηSGDm = 0.002, ηAdam = 0.0002.