Conformal Symplectic and Relativistic Optimization
Authors: Guilherme Franca, Jeremias Sulam, Daniel Robinson, Rene Vidal
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Numerical Experiments Let us compare RGD (Algorithm 1) against NAG (2) and CM (1) on some test problems. We stress that all hyperparameters of each of these methods were systematically optimized through Bayesian optimization [33] (the default implementation uses a Tree of Parzen estimators). This yields optimal and unbiased parameters automatically. Moreover, by checking the distribution of these hyperparameters during the tuning process we can get intuition on the sensitivity of each method. Thus, for each algorithm, we show its convergence rate in Fig. 1 when the best hyperparameters were used. In addition, in Fig. 2 we show the distribution of hyperparameters during the Bayesian optimization step the parameters are indicated and color lines follow Fig. 1. |
| Researcher Affiliation | Academia | Guilherme Franc a UC Berkeley Johns Hopkins Jeremias Sulam Johns Hopkins Daniel P. Robinson Lehigh Ren e Vidal Johns Hopkins |
| Pseudocode | Yes | Algorithm 1 Relativistic Gradient Descent (RGD) for minimizing a smooth function f(x). In practice, we recommend setting α = 1 which results in a conformal symplectic method. |
| Open Source Code | Yes | The actual code related to our implementation is extremely simple and can be found at [34]. G. Franc a, Relativistic gradient descent (RGD), 2020. https://github.com/guisf/rgd.git. |
| Open Datasets | Yes | Correlated quadratic Consider f(x) = (1/2)x T Qx where Qij = ρ|i j|, ρ = 0.95, and Q has size 50 50 this function was also used in [14]. Random quadratic Consider f(q) = (1/2)x T Qx where Q is a 500 500 positive definite random matrix with eigenvalues uniformly distributed in [10 3, 10]. Rosenbrock For a challenging problem in higher dimensions, consider the nonconvex Rosenbrock function f(x) Pn 1 i=1 100(xi+1 x2 i )2 +(1 xi)2 with n = 100 [35,36]; this case was already studied in detail [37]. Matrix completion We generate M = RST where R, S Rn r have i.i.d. entries from the normal distribution N(1, 2). |
| Dataset Splits | No | The paper discusses hyperparameter optimization but does not explicitly detail training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions using Bayesian optimization but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We stress that all hyperparameters of each of these methods were systematically optimized through Bayesian optimization... We initialize the position at random, x0,i N(0, 10), and the velocity as v0 = 0... initialized at x0,i = 2 for i odd/even. |