FOSI: Hybrid First and Second Order Optimization
Authors: Hadar Sivan, Moshe Gabel, Assaf Schuster
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation demonstrates that FOSI improves the convergence rate and optimization time of first-order methods such as Heavy-Ball and Adam, and outperforms second-order methods (K-FAC and L-BFGS). |
| Researcher Affiliation | Academia | Hadar Sivan Technion Haifa, Israel hadarsivan@cs.technion.ac.il Moshe Gabel York University Toronto, Canada mgabel@yorku.ca Assaf Schuster Technion Haifa, Israel assaf@cs.technion.ac.il |
| Pseudocode | Yes | The steps are summarized as Algorithm 1 in the Supplementary Material (Appendix A.1). ... Algorithm 2 provides the pseudocode for FOSI. |
| Open Source Code | Yes | An open source implementation of FOSI, available at: https://github.com/hsivan/fosi. |
| Open Datasets | Yes | Audio Classification (AC): Training Mobile Net V1 (approximately 4 million parameters) on the Audio Set dataset (Gemmeke et al., 2017). ... Language Model (LM): Training an RNN-based character-level language model ... on the Tiny Shakespeare dataset (Karpathy, 2015). ... Autoencoder (AE): Training an autoencoder model ... on the CIFAR-10 dataset. ... Transfer Learning (TL): Transfer learning from Image Net to CIFAR-10. ... Logistic Regression (LR): Training a multi-class logistic regression model to predict the 10 classes of the MNIST dataset. |
| Dataset Splits | No | The paper mentions using 'standard datasets' and reports 'validation accuracy' and 'validation loss', but it does not provide specific percentages or sample counts for training, validation, or test splits, nor does it explicitly state the splitting methodology or cite an external source for the splits used. |
| Hardware Specification | Yes | For experiments, we use an NVIDIA A40 GPU. |
| Software Dependencies | Yes | We implemented FOSI in Python using the JAX framework (Bradbury et al., 2018) 0.3.25. |
| Experiment Setup | Yes | We execute FOSI with k = 10 and ℓ= 0... We set α = 0.01, c = 3, and W such that warmup is one epoch. T is determined... resulting in T = 800 for all experiments. ... We use the standard learning rate for Adam (0.001), and the best learning rate for HB out of 0.1, 0.01, 0.001, with default momentum parameters β1 = 0.9, β2 = 0.999 for Adam and β = 0.9 for HB. |