reproducibilityindex.ai

A Progressive Batching L-BFGS Method for Machine Learning

Authors: Raghu Bollapragada, Jorge Nocedal, Dheevatsa Mudigere, Hao-Jun Shi, Ping Tak Peter Tang

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report numerical tests on large-scale logistic regression and deep neural network training tasks that indicate that our method is robust and efﬁcient, and has good generalization properties.
Researcher Affiliation	Collaboration	1Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA 2Intel Corporation, Bangalore, India 3Intel Corporation, Santa Clara, CA, USA.
Pseudocode	Yes	Algorithm 1 Progressive Batching L-BFGS Method Input: Initial iterate x0, initial sample size \|S0\|; Initialization: Set k 0 Repeat until convergence: 1: Sample Sk {1, , N} with sample size \|Sk\|
Open Source Code	No	The paper does not provide explicit statements or links indicating that the source code for their methodology is publicly available.
Open Datasets	Yes	We consider the 8 datasets listed in the supplement. An approximation R of the optimal function value is computed for each problem by running the full batch L-BFGS method until R(xk) 10 8. Training error is deﬁned as R(xk) R , where R(xk) is evaluated over the training set; test loss is evaluated over the test set without the ℓ2 regularization term. ... (i) a small convolutional neural network on CIFAR-10 (C) (Krizhevsky, 2009), (ii) an Alex Net-like convolutional network on MNIST and CIFAR-10 (A1, A2, respectively) (Le Cun et al., 1998; Krizhevsky et al., 2012)
Dataset Splits	Yes	Training error is deﬁned as R(xk) R , where R(xk) is evaluated over the training set; test loss is evaluated over the test set without the ℓ2 regularization term. ... SG and Adam are tuned using a development-based decay (devdecay) scheme, which track the best validation loss at each epoch and reduces the steplength by a constant factor δ if the validation loss does not improve after e epochs.
Hardware Specification	No	The paper does not provide specific details on the hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions that networks were implemented in 'Py Torch' but does not provide a specific version number for it or other software dependencies.
Experiment Setup	Yes	For the batch size control test (7), we choose θ = 0.9 in the logistic regression experiments, and θ is a tunable parameter chosen in the interval [0.9, 3] in the neural network experiments. The constant c1 in (16) is set to c1 = 10 4. For L-BFGS, we set the memory as m = 10. We skip the quasi-Newton update if the following curvature condition is not satisﬁed: y T k sk > ϵ sk 2, with ϵ = 10 2. The initial Hessian matrix Hk 0 in the L-BFGS recursion at each iteration is chosen as γk I where γk = y T k sk/y T k yk. ... In all our experiments, we initialize the batch size as \|S0\| = 512 in the PBQN method, and ﬁx the batch size to \|Sk\| = 128 for SG and Adam.