Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Progressive Batching L-BFGS Method for Machine Learning
Authors: Raghu Bollapragada, Jorge Nocedal, Dheevatsa Mudigere, Hao-Jun Shi, Ping Tak Peter Tang
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report numerical tests on large-scale logistic regression and deep neural network training tasks that indicate that our method is robust and efficient, and has good generalization properties. |
| Researcher Affiliation | Collaboration | 1Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA 2Intel Corporation, Bangalore, India 3Intel Corporation, Santa Clara, CA, USA. |
| Pseudocode | Yes | Algorithm 1 Progressive Batching L-BFGS Method Input: Initial iterate x0, initial sample size |S0|; Initialization: Set k 0 Repeat until convergence: 1: Sample Sk {1, , N} with sample size |Sk| |
| Open Source Code | No | The paper does not provide explicit statements or links indicating that the source code for their methodology is publicly available. |
| Open Datasets | Yes | We consider the 8 datasets listed in the supplement. An approximation R of the optimal function value is computed for each problem by running the full batch L-BFGS method until R(xk) 10 8. Training error is defined as R(xk) R , where R(xk) is evaluated over the training set; test loss is evaluated over the test set without the ℓ2 regularization term. ... (i) a small convolutional neural network on CIFAR-10 (C) (Krizhevsky, 2009), (ii) an Alex Net-like convolutional network on MNIST and CIFAR-10 (A1, A2, respectively) (Le Cun et al., 1998; Krizhevsky et al., 2012) |
| Dataset Splits | Yes | Training error is defined as R(xk) R , where R(xk) is evaluated over the training set; test loss is evaluated over the test set without the ℓ2 regularization term. ... SG and Adam are tuned using a development-based decay (devdecay) scheme, which track the best validation loss at each epoch and reduces the steplength by a constant factor δ if the validation loss does not improve after e epochs. |
| Hardware Specification | No | The paper does not provide specific details on the hardware (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions that networks were implemented in 'Py Torch' but does not provide a specific version number for it or other software dependencies. |
| Experiment Setup | Yes | For the batch size control test (7), we choose θ = 0.9 in the logistic regression experiments, and θ is a tunable parameter chosen in the interval [0.9, 3] in the neural network experiments. The constant c1 in (16) is set to c1 = 10 4. For L-BFGS, we set the memory as m = 10. We skip the quasi-Newton update if the following curvature condition is not satisfied: y T k sk > ϵ sk 2, with ϵ = 10 2. The initial Hessian matrix Hk 0 in the L-BFGS recursion at each iteration is chosen as γk I where γk = y T k sk/y T k yk. ... In all our experiments, we initialize the batch size as |S0| = 512 in the PBQN method, and fix the batch size to |Sk| = 128 for SG and Adam. |