Ode to an ODE

Authors: Krzysztof M. Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a new paradigm for Neural ODE algorithms, called ODEto ODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d). This nested system of two flows, where the parameter-flow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem which is intrinsically related to training deep neural network architectures such as Neural ODEs. Consequently, it leads to better downstream models, as we show on the example of training reinforcement learning policies with evolution strategies, and in the supervised learning setting, by comparing with previous SOTA baselines. We provide strong convergence results for our proposed mechanism that are independent of the depth of the network, supporting our empirical studies. Our results show an intriguing connection between the theory of deep neural networks and the field of matrix flows on compact manifolds. [...] We run two sets of experiments comparing ODEto ODE with several other methods in the supervised setting and to train RL policies with ES.
Researcher Affiliation Collaboration Krzysztof Choromanski Robotics at Google NY Jared Quincy Davis Deep Mind & Stanford University Valerii Likhosherstov University of Cambridge Xingyou Song Google Brain Jean-Jacques Slotine Massachusetts Institute of Technology Jacob Varley Robotics at Google NY Honglak Lee Google Brain Adrian Weller University of Cambridge & The Alan Turing Institute Vikas Sindhwani Robotics at Google NY
Pseudocode No The paper includes mathematical formulas and schematic diagrams (Figure 1), but no structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link to its source code for the methodology described.
Open Datasets Yes We then use the above framework to outperform previous Neural ODE variants and baseline architectures on RL tasks from Open AI Gym and the Deep Mind Control Suite, and simultaneously to yield strong results on image classification tasks. [...] All our supervised learning experiments use the Corrupted MNIST [41] (11 different variants) dataset.
Dataset Splits No The paper mentions using Corrupted MNIST and Open AI Gym/Deep Mind Control Suite environments, but does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using 'scipy.linalg.expm function' but does not specify version numbers for any software or libraries used in the experiments.
Experiment Setup Yes In all Neural ODE methods we were integrating on the time interval [0, T] for T = 1 and applied discretization with integration step size η = 0.04 (in our ODEto ODE we used that η for both: the main flow and the orthogonal flow on O(d)). The dimensionality of the embedding of the input state s was chosen to be h = 64 for Open AI Gym Humanoid (for all methods but Hyper Net, where we chose h = 16, see: discussion below) and h = 16 for all other tasks. [...] We used standard deviation stddev = 0.1 of the Gaussian noise defining ES perturbations, ES-gradient step size δ = 0.01 and function σ(x) = |x| as a nonlinear mapping. In all experiments we used k = 200 perturbations per iteration [15]. [...] For all models in Table 1, we did not use dropout, applied hidden width w = 128, and trained for 100 epochs. For all models in Table 2, we used dropout with r = 0.1 rate, hidden width w = 256, and trained for 100 epochs. For ODEto ODE variants, we used a discretization of η = 0.01.