Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions

Authors: Nikita Doikov, Sebastian U Stich, Martin Jaggi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theory is validated by numerical experiments executed on multiple practical machine learning problems. 8. Experiments We present illustrative numerical experiments on several machine learning problems. See Section A in the appendix for the details of our experiments and for extra plots.
Researcher Affiliation Academia 1Machine Learning and Optimization Laboratory (MLO), EPFL, Lausanne, Switzerland 2CISPA Helmholtz Center for Information Security, Saarbrücken, Germany.
Pseudocode Yes Algorithm 1 Adaptive Gradient Method with Spectral Preconditioning
Open Source Code No The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes In the following experiments, we train a convex logistic regression model on several machine learning datasets, using the gradient method with spectral preconditioning. We also compare its performance with quasi-Newton methods: BFGS and the limited memory BFGS (L-BFGS) (Nocedal & Wright, 2006). The results are shown in Fig. 6... https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Dataset Splits No The paper mentions datasets used but does not specify the exact training, validation, or test splits (percentages, counts, or predefined splits) for its own experiments.
Hardware Specification No The paper does not explicitly describe any specific hardware components (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies or versions (e.g., Python 3.x, PyTorch 1.x) that were used for the experiments.
Experiment Setup Yes For the spectral preconditioning, we fix the regularization parameter, according to our theory, at iteration k 0: αk = p L f(Xk, Yk) + βk, where we fix L := 1 and βk is fitted using a simple adaptive search... Namely, we start with an initial value of β0 := 0.05.