Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions
Authors: Nikita Doikov, Sebastian U Stich, Martin Jaggi
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theory is validated by numerical experiments executed on multiple practical machine learning problems. 8. Experiments We present illustrative numerical experiments on several machine learning problems. See Section A in the appendix for the details of our experiments and for extra plots. |
| Researcher Affiliation | Academia | 1Machine Learning and Optimization Laboratory (MLO), EPFL, Lausanne, Switzerland 2CISPA Helmholtz Center for Information Security, Saarbrücken, Germany. |
| Pseudocode | Yes | Algorithm 1 Adaptive Gradient Method with Spectral Preconditioning |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | In the following experiments, we train a convex logistic regression model on several machine learning datasets, using the gradient method with spectral preconditioning. We also compare its performance with quasi-Newton methods: BFGS and the limited memory BFGS (L-BFGS) (Nocedal & Wright, 2006). The results are shown in Fig. 6... https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ |
| Dataset Splits | No | The paper mentions datasets used but does not specify the exact training, validation, or test splits (percentages, counts, or predefined splits) for its own experiments. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware components (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or versions (e.g., Python 3.x, PyTorch 1.x) that were used for the experiments. |
| Experiment Setup | Yes | For the spectral preconditioning, we fix the regularization parameter, according to our theory, at iteration k 0: αk = p L f(Xk, Yk) + βk, where we fix L := 1 and βk is fitted using a simple adaptive search... Namely, we start with an initial value of β0 := 0.05. |