Trivializations for Gradient-Based Optimization on Manifolds
Authors: Mario Lezcano Casado
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we assess the effectiveness of dynamic trivializations (DTRIV) in the context of orthogonal optimization. We test the framework with the basis changed every K = 1, 100, steps. We compare it against the most performant previous approaches presented for this task in the context of orthogonal optimization and a vanilla LSTM. These approaches are orthogonal exponential trivialization [EXPRNN Lezcano-Casado and Martínez-Rubio, 2019], orthogonal and unitary Cayley trivializations [SCORNN / SCURNN Helfrich et al., 2018, Maduranga et al., 2018], and Riemannian gradient descent [RGD Wisdom et al., 2016]. Table 1: Best test accuracy at MNIST and P-MNIST. Table 2: Test MSE at the end of the epoch with the lowest validation MSE for the TIMIT task. |
| Researcher Affiliation | Academia | Mario Lezcano-Casado Department of Mathematics University of Oxford Oxford, mario.lezcanocasado@maths.ox.ac.uk |
| Pseudocode | Yes | Algorithm 5.1 (Dynamic trivialization through retractions). Given a retraction r, an integer K > 0 or K = , and a starting point p0, the dynamic trivialization induced by r is defined as the sequence of problems indexed by i = 0, 1, . . . min y Tpi M f(rpi(y)) where pi+1 := rpi(yi,K) M, and yi,k Tpi M for k = 1, . . . , K, is a sequence of approximations given by a Euclidean optimization algorithm e.g., SGD, ADAM, ADAGRAD, RMSPROP, . . . applied to the i-th problem with starting point yi,0 = 0. We say that pi is the basis at step i. |
| Open Source Code | Yes | An implementation can be found at: https://github.com/Lezcano/exp RNN |
| Open Datasets | Yes | MNIST dataset [Le Cun and Cortes, 2010] and TIMIT dataset [S Garofolo et al., 1992] |
| Dataset Splits | No | The paper mentions using well-known datasets (MNIST, TIMIT) and refers to a 'validation MSE' but does not explicitly provide the specific percentages or sample counts for training, validation, and test splits in the main text. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for experiments, such as CPU or GPU models, memory, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions optimization algorithms like ADAM, ADAGRAD, and RMSPROP, but does not specify any software libraries, frameworks, or their version numbers used for implementation. |
| Experiment Setup | No | The paper states: 'We detail all the hyperparameters and set-up in Appendix F.', indicating that specific experimental setup details are not provided in the main text. |