Adaptive Proximal Gradient Methods Are Universal Without Approximation

Authors: Konstantinos Oikonomidis, Emanuel Laude, Puya Latafat, Andreas Themelis, Panagiotis Patrinos

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments make comparisons with baseline methods on diverse tasks from machine learning covering both the locally and the globally Hölder setting. In numerical simulations we show that ada PGq, q 2 performs well on a collection of locally and globally Hölder smooth problems, such as classification with Höldersmooth SVMs and a p-norm version of Lasso.
Researcher Affiliation Academia 1Department of Electrical Engineering (ESAT-STADIUS), KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium 2Faculty of Information Science and Electrical Engineering (ISEE), Kyushu University, 744 Motooka, Nishi-ku 819-0395, Fukuoka, Japan. Correspondence to: Konstantinos Oikonomidis <konstantinos.oikonomidis@kuleuven.be>.
Pseudocode Yes Algorithm 1 Ada PGq, q 2 : A universal adaptive proximal gradient method Require: starting point x 1 n, stepsizes γ0 γ 1 > 0, parameter q [1, 2] Initialize: x0 = proxγ0g(x 1 γ0 f(x 1)) Repeat for k = 0, 1, . . . until convergence 1: With ℓk, Lk given in (6), define the stepsize as (with notation [z]+ = max {z, 0} and convention 1 γk+1 = γk min 1 q + γk γk 1 , 1 q 2[γ2 k L2 k (2 q)γkℓk + 1 q]+ 2: xk+1 = proxγk+1g(xk γk+1 f(xk))
Open Source Code Yes The implementation for reproducing the experiments is publicly available.2 https://github.com/Emanuel Laude/ universal-adaptive-proximal-gradient
Open Datasets Yes In this subsection we assess the performance of the different algorithms on the task of classification using a p-norm version of the SVM model. For that purpose we consider a subset of the Lib SVM binary classification benchmark that consists of a collection of datasets with varying number m of examples ai, feature dimensions n and sparsity.
Dataset Splits No The paper mentions using datasets like the Lib SVM benchmark and generating random instances, but does not explicitly provide details about training, validation, and test splits (e.g., percentages, sample counts, or specific split files) for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions comparing with NUPG, F-NUPG, and AC-FGM methods and providing an implementation of their method, but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes Regarding the positive sequences (γk)k , (βk)k , (τk)k , we use the update rule described in (Li & Lan, 2023, Cor. 3) and as such in Corollary 2 of the same paper. We choose β1 = 0 and βk = β = 1 3 2 for all k 2, ensure that γ1 [ β 4(1 β)c1 , 1 3c1 ] and set γ2 = β 2c1 and γk+1 = min{ τk 1+1 4ck } for all k 2. Finally, τ1 = 0, τ2 = 2 and τk+1 = τk + α 2 + 2(1 α)γkck βτk , for k 2 and some α [0, 1]. We chose α = 0, since this configuration consistently outperformed the others in our experiments.