Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Global Convergence of Online Limited Memory BFGS

Authors: Aryan Mokhtari, Alejandro Ribeiro

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Global convergence of an online (stochastic) limited memory version of the Broyden-Fletcher Goldfarb-Shanno (BFGS) quasi-Newton method for solving optimization problems with stochastic objectives that arise in large scale machine learning is established. Lower and upper bounds on the Hessian eigenvalues of the sample functions are shown to suffice to guarantee that the curvature approximation matrices have bounded determinants and traces, which, in turn, permits establishing convergence to optimal arguments with probability 1. Experimental evaluation on a search engine advertising problem showcase reductions in convergence time relative to stochastic gradient descent algorithms.
Researcher Affiliation Academia Aryan Mokhtari EMAIL Alejandro Ribeiro EMAIL Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia, PA 19104, USA
Pseudocode Yes Algorithm 1 Computation of o LBFGS step q = ˆB 1 t p when called with p = ˆs(wt, θt). Algorithm 2 o LBFGS Require: Initial value w0. Initial Hessian approximation parameter ˆγ0 = 1.
Open Source Code No The paper does not contain any statement about open-source code availability, nor does it provide a link to a code repository. The text is ambiguous and lacks a clear, affirmative statement of release.
Open Datasets Yes For the CTR problem considered here we use the Tencent search engine data set Sun (2012).
Dataset Splits Yes Out of the 236 106 in the Tencent data set we select 106 sample points to use as the training set S and 105 sample points to use as a test set T. To select elements of the training and test set we divide the first 1.1 106 sample points of the complete data set in 105 consecutive blocks with 11 elements. The first 10 elements of the block are assigned to the training set and the 11th element to the test set.
Hardware Specification No The paper does not provide any specific hardware details such as CPU/GPU models, processor types, or memory amounts used for running the experiments. It only mentions 'cost of each o LBFGS iteration is in the order of 2.1 104 operations'.
Software Dependencies No The paper mentions implementing SGD and o LBFGS and references various methods (e.g., BFGS, Newton's method) but does not specify any software libraries, frameworks, or their version numbers used for implementation.
Experiment Setup Yes In all of our numerical experiments the regularization parameter in (38) is λ = 10 6. The step sizes for both algorithms are of the form ϵt = ϵ0T0/(T0 + t). We set ϵ0 = 10 2 and T0 = 104 for o LBFGS and ϵ0 = 10 1 and T0 = 106 for SGD. For SGD the sample size in (9) is set to L = 20 whereas for o LBFGS it is set to L = 100. The size of memory for o LBFGS is set to τ = 10.