Conformal Symplectic and Relativistic Optimization

Authors: Guilherme Franca, Jeremias Sulam, Daniel Robinson, Rene Vidal

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Numerical Experiments Let us compare RGD (Algorithm 1) against NAG (2) and CM (1) on some test problems. We stress that all hyperparameters of each of these methods were systematically optimized through Bayesian optimization [33] (the default implementation uses a Tree of Parzen estimators). This yields optimal and unbiased parameters automatically. Moreover, by checking the distribution of these hyperparameters during the tuning process we can get intuition on the sensitivity of each method. Thus, for each algorithm, we show its convergence rate in Fig. 1 when the best hyperparameters were used. In addition, in Fig. 2 we show the distribution of hyperparameters during the Bayesian optimization step the parameters are indicated and color lines follow Fig. 1.
Researcher Affiliation Academia Guilherme Franc a UC Berkeley Johns Hopkins Jeremias Sulam Johns Hopkins Daniel P. Robinson Lehigh Ren e Vidal Johns Hopkins
Pseudocode Yes Algorithm 1 Relativistic Gradient Descent (RGD) for minimizing a smooth function f(x). In practice, we recommend setting α = 1 which results in a conformal symplectic method.
Open Source Code Yes The actual code related to our implementation is extremely simple and can be found at [34]. G. Franc a, Relativistic gradient descent (RGD), 2020. https://github.com/guisf/rgd.git.
Open Datasets Yes Correlated quadratic Consider f(x) = (1/2)x T Qx where Qij = ρ|i j|, ρ = 0.95, and Q has size 50 50 this function was also used in [14]. Random quadratic Consider f(q) = (1/2)x T Qx where Q is a 500 500 positive definite random matrix with eigenvalues uniformly distributed in [10 3, 10]. Rosenbrock For a challenging problem in higher dimensions, consider the nonconvex Rosenbrock function f(x) Pn 1 i=1 100(xi+1 x2 i )2 +(1 xi)2 with n = 100 [35,36]; this case was already studied in detail [37]. Matrix completion We generate M = RST where R, S Rn r have i.i.d. entries from the normal distribution N(1, 2).
Dataset Splits No The paper discusses hyperparameter optimization but does not explicitly detail training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies No The paper mentions using Bayesian optimization but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We stress that all hyperparameters of each of these methods were systematically optimized through Bayesian optimization... We initialize the position at random, x0,i N(0, 10), and the velocity as v0 = 0... initialized at x0,i = 2 for i odd/even.