Trivializations for Gradient-Based Optimization on Manifolds

Authors: Mario Lezcano Casado

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we assess the effectiveness of dynamic trivializations (DTRIV) in the context of orthogonal optimization. We test the framework with the basis changed every K = 1, 100, steps. We compare it against the most performant previous approaches presented for this task in the context of orthogonal optimization and a vanilla LSTM. These approaches are orthogonal exponential trivialization [EXPRNN Lezcano-Casado and Martínez-Rubio, 2019], orthogonal and unitary Cayley trivializations [SCORNN / SCURNN Helfrich et al., 2018, Maduranga et al., 2018], and Riemannian gradient descent [RGD Wisdom et al., 2016]. Table 1: Best test accuracy at MNIST and P-MNIST. Table 2: Test MSE at the end of the epoch with the lowest validation MSE for the TIMIT task.
Researcher Affiliation Academia Mario Lezcano-Casado Department of Mathematics University of Oxford Oxford, mario.lezcanocasado@maths.ox.ac.uk
Pseudocode Yes Algorithm 5.1 (Dynamic trivialization through retractions). Given a retraction r, an integer K > 0 or K = , and a starting point p0, the dynamic trivialization induced by r is defined as the sequence of problems indexed by i = 0, 1, . . . min y Tpi M f(rpi(y)) where pi+1 := rpi(yi,K) M, and yi,k Tpi M for k = 1, . . . , K, is a sequence of approximations given by a Euclidean optimization algorithm e.g., SGD, ADAM, ADAGRAD, RMSPROP, . . . applied to the i-th problem with starting point yi,0 = 0. We say that pi is the basis at step i.
Open Source Code Yes An implementation can be found at: https://github.com/Lezcano/exp RNN
Open Datasets Yes MNIST dataset [Le Cun and Cortes, 2010] and TIMIT dataset [S Garofolo et al., 1992]
Dataset Splits No The paper mentions using well-known datasets (MNIST, TIMIT) and refers to a 'validation MSE' but does not explicitly provide the specific percentages or sample counts for training, validation, and test splits in the main text.
Hardware Specification No The paper does not provide specific details on the hardware used for experiments, such as CPU or GPU models, memory, or cloud computing specifications.
Software Dependencies No The paper mentions optimization algorithms like ADAM, ADAGRAD, and RMSPROP, but does not specify any software libraries, frameworks, or their version numbers used for implementation.
Experiment Setup No The paper states: 'We detail all the hyperparameters and set-up in Appendix F.', indicating that specific experimental setup details are not provided in the main text.