A Progressive Batching L-BFGS Method for Machine Learning
Authors: Raghu Bollapragada, Jorge Nocedal, Dheevatsa Mudigere, Hao-Jun Shi, Ping Tak Peter Tang
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report numerical tests on large-scale logistic regression and deep neural network training tasks that indicate that our method is robust and efficient, and has good generalization properties. |
| Researcher Affiliation | Collaboration | 1Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA 2Intel Corporation, Bangalore, India 3Intel Corporation, Santa Clara, CA, USA. |
| Pseudocode | Yes | Algorithm 1 Progressive Batching L-BFGS Method Input: Initial iterate x0, initial sample size |S0|; Initialization: Set k 0 Repeat until convergence: 1: Sample Sk {1, , N} with sample size |Sk| |
| Open Source Code | No | The paper does not provide explicit statements or links indicating that the source code for their methodology is publicly available. |
| Open Datasets | Yes | We consider the 8 datasets listed in the supplement. An approximation R of the optimal function value is computed for each problem by running the full batch L-BFGS method until R(xk) 10 8. Training error is defined as R(xk) R , where R(xk) is evaluated over the training set; test loss is evaluated over the test set without the ℓ2 regularization term. ... (i) a small convolutional neural network on CIFAR-10 (C) (Krizhevsky, 2009), (ii) an Alex Net-like convolutional network on MNIST and CIFAR-10 (A1, A2, respectively) (Le Cun et al., 1998; Krizhevsky et al., 2012) |
| Dataset Splits | Yes | Training error is defined as R(xk) R , where R(xk) is evaluated over the training set; test loss is evaluated over the test set without the ℓ2 regularization term. ... SG and Adam are tuned using a development-based decay (devdecay) scheme, which track the best validation loss at each epoch and reduces the steplength by a constant factor δ if the validation loss does not improve after e epochs. |
| Hardware Specification | No | The paper does not provide specific details on the hardware (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions that networks were implemented in 'Py Torch' but does not provide a specific version number for it or other software dependencies. |
| Experiment Setup | Yes | For the batch size control test (7), we choose θ = 0.9 in the logistic regression experiments, and θ is a tunable parameter chosen in the interval [0.9, 3] in the neural network experiments. The constant c1 in (16) is set to c1 = 10 4. For L-BFGS, we set the memory as m = 10. We skip the quasi-Newton update if the following curvature condition is not satisfied: y T k sk > ϵ sk 2, with ϵ = 10 2. The initial Hessian matrix Hk 0 in the L-BFGS recursion at each iteration is chosen as γk I where γk = y T k sk/y T k yk. ... In all our experiments, we initialize the batch size as |S0| = 512 in the PBQN method, and fix the batch size to |Sk| = 128 for SG and Adam. |