reproducibilityindex.ai

FOSI: Hybrid First and Second Order Optimization

Authors: Hadar Sivan, Moshe Gabel, Assaf Schuster

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluation demonstrates that FOSI improves the convergence rate and optimization time of first-order methods such as Heavy-Ball and Adam, and outperforms second-order methods (K-FAC and L-BFGS).
Researcher Affiliation	Academia	Hadar Sivan Technion Haifa, Israel hadarsivan@cs.technion.ac.il Moshe Gabel York University Toronto, Canada mgabel@yorku.ca Assaf Schuster Technion Haifa, Israel assaf@cs.technion.ac.il
Pseudocode	Yes	The steps are summarized as Algorithm 1 in the Supplementary Material (Appendix A.1). ... Algorithm 2 provides the pseudocode for FOSI.
Open Source Code	Yes	An open source implementation of FOSI, available at: https://github.com/hsivan/fosi.
Open Datasets	Yes	Audio Classification (AC): Training Mobile Net V1 (approximately 4 million parameters) on the Audio Set dataset (Gemmeke et al., 2017). ... Language Model (LM): Training an RNN-based character-level language model ... on the Tiny Shakespeare dataset (Karpathy, 2015). ... Autoencoder (AE): Training an autoencoder model ... on the CIFAR-10 dataset. ... Transfer Learning (TL): Transfer learning from Image Net to CIFAR-10. ... Logistic Regression (LR): Training a multi-class logistic regression model to predict the 10 classes of the MNIST dataset.
Dataset Splits	No	The paper mentions using 'standard datasets' and reports 'validation accuracy' and 'validation loss', but it does not provide specific percentages or sample counts for training, validation, or test splits, nor does it explicitly state the splitting methodology or cite an external source for the splits used.
Hardware Specification	Yes	For experiments, we use an NVIDIA A40 GPU.
Software Dependencies	Yes	We implemented FOSI in Python using the JAX framework (Bradbury et al., 2018) 0.3.25.
Experiment Setup	Yes	We execute FOSI with k = 10 and ℓ= 0... We set α = 0.01, c = 3, and W such that warmup is one epoch. T is determined... resulting in T = 800 for all experiments. ... We use the standard learning rate for Adam (0.001), and the best learning rate for HB out of 0.1, 0.01, 0.001, with default momentum parameters β1 = 0.9, β2 = 0.999 for Adam and β = 0.9 for HB.