Only tails matter: Average-Case Universality and Robustness in the Convex Regime
Authors: Leonardo Cunha, Gauthier Gidel, Fabian Pedregosa, Damien Scieur, Courtney Paquette
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare its performance to classical optimization algorithms, such as gradient descent or Nesterov s scheme, and we show that, in the average-case context, Nesterov s method is universally nearly optimal asymptotically. ... 5. Experiments We generate random quadratic problems in two different ways. First, by sampling H = XX where the entries of X are i.i.d. Gaussian. ... The other process we use to generate random quadratic problems is by sampling Λ Rd from the corresponding Beta distribution and taking H = U diag(Λ)U , where U is an independently sampled orthonormal matrix. We let x = 0 and sample x0 from a standard Gaussian distribution. In all experiments, we use the problem s instance largest eigenvalue to calibrate each method (e.g., gradient descent s step size is 1/L). |
| Researcher Affiliation | Collaboration | 1MILA and DIRO, Université de Montreal, Montreal, Canada 2Canada CIFAR AI Chair 3Google Research 4Samsung SAIT AI Lab, Montreal, Canada 5Mc Gill University, Montreal, Canada. |
| Pseudocode | Yes | Algorithm 1 Generalized Chebyshev Method GCM(α, β) |
| Open Source Code | No | The paper does not provide a link or explicit statement about the availability of its source code. |
| Open Datasets | Yes | Figure 4. Function values for different methods where the data comes from CIFAR-10 Inception features (left) and MNIST features (right). |
| Dataset Splits | No | The paper mentions 'standard deviation for 8 different samplings of the distribution and of the random initialization' but does not specify explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify any hardware used for running the experiments (e.g., GPU, CPU models, memory). |
| Software Dependencies | No | The paper does not provide specific software versions for libraries or environments used in the experiments. |
| Experiment Setup | Yes | We let x = 0 and sample x0 from a standard Gaussian distribution. In all experiments, we use the problem s instance largest eigenvalue to calibrate each method (e.g., gradient descent s step size is 1/L). |