Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-scale Neural Network Optimization

Authors: Yue Niu, Zalan Fabian, Sunwoo Lee, Mahdi Soltanolkotabi, Salman Avestimehr

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To investigate m L-BFGS s potential in large-scale DNN training, we train benchmark neural models using m L-BFGS and compare performance with baselines (SGD, Adam, and other quasi-Newton methods). Results show that m L-BFGS achieves both noticeable iteration-wise and wall-clock speedup. ... We conduct various experiments on computer vision (CV) problems involving datasets such as CIFAR-10, CIFAR-100 and Image Net.
Researcher Affiliation Academia Yue Niu EMAIL Department of Electrical and Computer Engineering University of Southern California Zalan Fabian EMAIL Department of Electrical and Computer Engineering University of Southern California Sunwoo Lee EMAIL Department of Computer Science and Engineering, Inha University Mahdi Soltanolkotabi EMAIL Department of Electrical and Computer Engineering University of Southern California Salman Avestimehr EMAIL Department of Electrical and Computer Engineering University of Southern California
Pseudocode Yes Algorithm 1 m L-BFGS algorithm (T, M, parameter block Îļi p i=1) ... Algorithm 2 Hessian-Vector in L-BFGS
Open Source Code No The paper states: "The current implementation is based on Py Torch." (Section 5). However, it does not provide any explicit statement about releasing the code, a repository link, or mention of code in supplementary materials for the described methodology.
Open Datasets Yes Empirical evaluations show that, on benchmark datasets, CIFAR-10 and Image Net, and models such as Res Net and Vision Transformer, m L-BFGS achieves a faster per-iteration convergence compared to SGD and Adam. ... We conduct various experiments on computer vision (CV) problems involving datasets such as CIFAR-10, CIFAR-100 and Image Net.
Dataset Splits Yes Image Net has been the gold standard for evaluating the performance of optimizers. It consists of 1.2M training and 50K test images, categorized into 1000 classes. We follow the standard data pre-processing procedure, where each image is first resized to 256 256, and randomly cropped to 224 224 and flipped horizontally. Each image is then normalized using pre-computed mean and variance. ... Table 3 lists the validation accuracy on CIFAR-10 and CIFAR-100. ... Figure 4b). Furthermore, it also generalizes well on the validation set, and finally reaches comparable validation accuracy to SGD.
Hardware Specification Yes We use a single GPU server with 8 Nvidia Quadro RTX 5000 GPUs to simulate a distributed system, where each GPU is used as a worker to perform forward and backward passes, and model updates.
Software Dependencies No The paper states: "The current implementation is based on Py Torch." (Section 5). However, it does not specify a version number for PyTorch or any other software components.
Experiment Setup Yes Hyperparameters are tuned to achieve the best validation accuracy. Details are provided in Appendix A.4.1. ... Table 5: Hyperparameters for SGD, Adam, m L-BFGS on CIFAR-10/CIFAR-100 ... Table 6: Hyperparameters for SGD, Adam, KFAC and m L-BFGS of Res Net50 on Image Net