reproducibilityindex.ai

Conformal Symplectic and Relativistic Optimization

Authors: Guilherme Franca, Jeremias Sulam, Daniel Robinson, Rene Vidal

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Numerical Experiments Let us compare RGD (Algorithm 1) against NAG (2) and CM (1) on some test problems. We stress that all hyperparameters of each of these methods were systematically optimized through Bayesian optimization [33] (the default implementation uses a Tree of Parzen estimators). This yields optimal and unbiased parameters automatically. Moreover, by checking the distribution of these hyperparameters during the tuning process we can get intuition on the sensitivity of each method. Thus, for each algorithm, we show its convergence rate in Fig. 1 when the best hyperparameters were used. In addition, in Fig. 2 we show the distribution of hyperparameters during the Bayesian optimization step the parameters are indicated and color lines follow Fig. 1.
Researcher Affiliation	Academia	Guilherme Franc a UC Berkeley Johns Hopkins Jeremias Sulam Johns Hopkins Daniel P. Robinson Lehigh Ren e Vidal Johns Hopkins
Pseudocode	Yes	Algorithm 1 Relativistic Gradient Descent (RGD) for minimizing a smooth function f(x). In practice, we recommend setting α = 1 which results in a conformal symplectic method.
Open Source Code	Yes	The actual code related to our implementation is extremely simple and can be found at [34]. G. Franc a, Relativistic gradient descent (RGD), 2020. https://github.com/guisf/rgd.git.
Open Datasets	Yes	Correlated quadratic Consider f(x) = (1/2)x T Qx where Qij = ρ\|i j\|, ρ = 0.95, and Q has size 50 50 this function was also used in [14]. Random quadratic Consider f(q) = (1/2)x T Qx where Q is a 500 500 positive deﬁnite random matrix with eigenvalues uniformly distributed in [10 3, 10]. Rosenbrock For a challenging problem in higher dimensions, consider the nonconvex Rosenbrock function f(x) Pn 1 i=1 100(xi+1 x2 i )2 +(1 xi)2 with n = 100 [35,36]; this case was already studied in detail [37]. Matrix completion We generate M = RST where R, S Rn r have i.i.d. entries from the normal distribution N(1, 2).
Dataset Splits	No	The paper discusses hyperparameter optimization but does not explicitly detail training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies	No	The paper mentions using Bayesian optimization but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	We stress that all hyperparameters of each of these methods were systematically optimized through Bayesian optimization... We initialize the position at random, x0,i N(0, 10), and the velocity as v0 = 0... initialized at x0,i = 2 for i odd/even.