Only tails matter: Average-Case Universality and Robustness in the Convex Regime

Authors: Leonardo Cunha, Gauthier Gidel, Fabian Pedregosa, Damien Scieur, Courtney Paquette

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare its performance to classical optimization algorithms, such as gradient descent or Nesterov s scheme, and we show that, in the average-case context, Nesterov s method is universally nearly optimal asymptotically. ... 5. Experiments We generate random quadratic problems in two different ways. First, by sampling H = XX where the entries of X are i.i.d. Gaussian. ... The other process we use to generate random quadratic problems is by sampling Λ Rd from the corresponding Beta distribution and taking H = U diag(Λ)U , where U is an independently sampled orthonormal matrix. We let x = 0 and sample x0 from a standard Gaussian distribution. In all experiments, we use the problem s instance largest eigenvalue to calibrate each method (e.g., gradient descent s step size is 1/L).
Researcher Affiliation Collaboration 1MILA and DIRO, Université de Montreal, Montreal, Canada 2Canada CIFAR AI Chair 3Google Research 4Samsung SAIT AI Lab, Montreal, Canada 5Mc Gill University, Montreal, Canada.
Pseudocode Yes Algorithm 1 Generalized Chebyshev Method GCM(α, β)
Open Source Code No The paper does not provide a link or explicit statement about the availability of its source code.
Open Datasets Yes Figure 4. Function values for different methods where the data comes from CIFAR-10 Inception features (left) and MNIST features (right).
Dataset Splits No The paper mentions 'standard deviation for 8 different samplings of the distribution and of the random initialization' but does not specify explicit training, validation, or test dataset splits.
Hardware Specification No The paper does not specify any hardware used for running the experiments (e.g., GPU, CPU models, memory).
Software Dependencies No The paper does not provide specific software versions for libraries or environments used in the experiments.
Experiment Setup Yes We let x = 0 and sample x0 from a standard Gaussian distribution. In all experiments, we use the problem s instance largest eigenvalue to calibrate each method (e.g., gradient descent s step size is 1/L).